Unable to load data in Druid without Timestamp column specificaition

Hi All ,

I want to load data in druid without any timestamp column i.e data should be loaded in batches (number datatype) . So whenever i give

“timestampSpec” : {

“column” : “”,

“format” : “auto”

},

It either gives me NULL pointer exception .

Below is my data file with few sample data with 4 columns

“movieID”,“itemID”,“rating”,“userID”

196,242,3,881250949

186,302,3,891717742

22,377,1,878887116

I want to load data in Druid based on userID i.e. the last pulled info should be userID column (for incremental loads) .

P.s Note : i don’t have any Timestamp column in my file but still i want to pull data based on some integer fields.

Kindly let me know below things :

1.Is it mandatory to have timestamp fields in json file to pull data .

2.If it is then how we can load data in druid which doesn’t have any timestamp columns .

  1. Kindly let me know the changes i should make to make the load possible in druid . I don’t find any proper documentation addressing this situation .

Below is my movie_index_task_no_date.json

{

“type” : “index”,

“spec” : {

“dataSchema” : {

“dataSource” : “movie_lens”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “csv”,

"timestampSpec" : {

** “column” : “”, // Putting **userID here give invalid time format error .

** “format” : “auto”**

** ** },

“columns”:[“movieID”,“itemID”,“rating”,“userID”,],

“dimensionsSpec” : {

“dimensions”: [“movieID”,“itemID”,“rating”,“userID”],

“dimensionExclusions” : ,

“spatialDimensions” :

}

}

},

“filter”: {

“type”: “selector”

},

“metricsSpec” : ,

“granularitySpec” : {

“type” : “arbitrary”,

“segmentGranularity”:“userID”,

“intervals” : [“1/2”]

}

},

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir” : “/Users/rajesh.d.torgal/druid-0.9.2/quickstart/”,

“filter” : “movie_data_no_date.csv”

}

},

“tuningConfig” : {

“type” : “index”,

“targetPartitionSize” : 0,

“rowFlushBoundary” : 0

}

}

}

Now I got a same problem with you, Have you resolve it ? Because my data has more than 4 million rows and i dont want to generate them again with a first collumn as timestamp collumn.

Vào 16:55:54 UTC+7 Thứ Hai, ngày 12 tháng 6 năm 2017, Priyabrat Bishwal đã viết:

1.Is it mandatory to have timestamp fields in json file to pull data .

Yes, a timestamp field is mandatory.

this is my query for my csv file.

{

“type” : “index”,

“dataSource” : “newcarpurchaseinquiries_fullarchiveon28032018”,

“granularitySpec” : {

“type” : “string”,

“gran” : “date”,

“intervals” : [ “2018-01-10/2018-01-10” ]

},

“aggregators” : [ {

“type” : “count”,

“name” : “count”

}, {

“type” : “doubleSum”,

“name” : “value”,

“fieldName” : “value”

}],

“firehose” : {

“type” : “local”,

“baseDir” : “/home/dwtest2/druid-0.12.2/quickstart”,

“filter” : “newcarpurchaseinquiries_fullarchiveon28032018.csv”,

“parser” : {

“timestampSpec” : {

“column” : “ReqDateTimeDatePart”,

“format” : “auto”

},

“data” : {

“type” : “csv”,

“columns” : [ “CustomerId”,

“CarVersionId”,

“Color”,

“NoOfCars”,

“BuyTime”,

“Comments”,

“RequestDateTime”,

“IsApproved”,

“IsFake”,

“StatusId”,

“IsForwarded”,

“IsRejected”,

“IsViewed”,

“IsMailSend”,

“TestdriveDate”,

“TestDriveLocation”,

“LatestOffers”,

“ForwardedLead”,

“SourceId”,

“ReqDateTimeDatePart”,

“VisitedDealership”,

“CRM_LeadId”,

“ClientIP”,

“PQPageId”,

“LTSRC”,

“UtmaCookie”,

“UtmzCookie”],

“dimensions” : [ “CustomerId”,

“CarVersionId”,

“Color”,

“NoOfCars”,

“BuyTime”,

“Comments”,

“RequestDateTime”,

“IsApproved”,

“IsFake”,

“StatusId”,

“IsForwarded”,

“IsRejected”,

“IsViewed”,

“IsMailSend”,

“TestdriveDate”,

“TestDriveLocation”,

“LatestOffers”,

“ForwardedLead”,

“SourceId”,

“ReqDateTimeDatePart”,

“VisitedDealership”,

“CRM_LeadId”,

“ClientIP”,

“PQPageId”,

“LTSRC”,

“UtmaCookie”,

“UtmzCookie”]

}

}

}

While loading above query i get following error mesaage

Error 500

HTTP ERROR: 500

Problem accessing /druid/indexer/v1/task. Reason:

    javax.servlet.ServletException: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: HttpInputOverHTTP@2d5e6422[c=2096,q=0,[0]=null,s=EOF]; line: 1, column: 0])

at [Source: HttpInputOverHTTP@2d5e6422[c=2096,q=0,[0]=null,s=EOF]; line: 1, column: 4193]

Powered by Jetty:// 9.3.19.v20170502

Please suggest me if any changes.And my csv file doesnt contain 1st column named as timestamp it has “id”…

So what should i do for uploading the above query.