This topic needs a title

can anyone suggest me whats wrong in this csv file uploading query…

this is my query for my csv file.

{

“type” : “index”,

“dataSource” : “newcarpurchaseinquiries_fullarchiveon28032018”,

“granularitySpec” : {

“type” : “string”,

“gran” : “date”,

“intervals” : [ “2018-01-10/2018-01-10” ]

},

“aggregators” : [ {

“type” : “count”,

“name” : “count”

}, {

“type” : “doubleSum”,

“name” : “value”,

“fieldName” : “value”

}],

“firehose” : {

“type” : “local”,

“baseDir” : “/home/dwtest2/druid-0.12.2/quickstart”,

“filter” : “newcarpurchaseinquiries_fullarchiveon28032018.csv”,

“parser” : {

“timestampSpec” : {

“column” : “ReqDateTimeDatePart”,

“format” : “auto”

},

“data” : {

“type” : “csv”,

“columns” : [ “CustomerId”,

“CarVersionId”,

“Color”,

“NoOfCars”,

“BuyTime”,

“Comments”,

“RequestDateTime”,

“IsApproved”,

“IsFake”,

“StatusId”,

“IsForwarded”,

“IsRejected”,

“IsViewed”,

“IsMailSend”,

“TestdriveDate”,

“TestDriveLocation”,

“LatestOffers”,

“ForwardedLead”,

“SourceId”,

“ReqDateTimeDatePart”,

“VisitedDealership”,

“CRM_LeadId”,

“ClientIP”,

“PQPageId”,

“LTSRC”,

“UtmaCookie”,

“UtmzCookie”],

“dimensions” : [ “CustomerId”,

“CarVersionId”,

“Color”,

“NoOfCars”,

“BuyTime”,

“Comments”,

“RequestDateTime”,

“IsApproved”,

“IsFake”,

“StatusId”,

“IsForwarded”,

“IsRejected”,

“IsViewed”,

“IsMailSend”,

“TestdriveDate”,

“TestDriveLocation”,

“LatestOffers”,

“ForwardedLead”,

“SourceId”,

“ReqDateTimeDatePart”,

“VisitedDealership”,

“CRM_LeadId”,

“ClientIP”,

“PQPageId”,

“LTSRC”,

“UtmaCookie”,

“UtmzCookie”]

}

}

}

While loading above query i get following error mesaage

Error 500

HTTP ERROR: 500

Problem accessing /druid/indexer/v1/task. Reason:

    javax.servlet.ServletException: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: HttpInputOverHTTP@2d5e6422[c=2096,q=0,[0]=null,s=EOF]; line: 1, column: 0])

at [Source: HttpInputOverHTTP@2d5e6422[c=2096,q=0,[0]=null,s=EOF]; line: 1, column: 4193]

Powered by Jetty:// 9.3.19.v20170502

Please suggest me if any changes.And my csv file doesnt contain 1st column named as timestamp it has “id”…

So what should i do for uploading the above query.

I had problems uploading CSV file. If you look into the logs of the indexer you will see a detailed error. I then switched to a more structured format like Parquet or Avro and it worked.

Can you tell me more specifically how and where to use paraquet and Avro because I am very much beginner to Druid

Here is a sample… the data in the avro based hive table in the sample is at day granularity level:

{

“type”: “index_hadoop”,

“spec”:{

“ioConfig”: {

 "type" : "hadoop",

 "inputSpec" : {

   "type" : "static",

“inputFormat”: “io.druid.data.input.avro.AvroValueInputFormat”,

   "paths" : "s3://my-bucket/my_table_location/"

 }

},

“dataSchema”: {

“dataSource”: “test8”,

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.7.3”, “org.apache.hadoop:hadoop-aws:2.7.3”],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “day”,

“queryGranularity”: “day”,

“intervals”: [

“2018-04-18/2018-06-30”

],

“rollup”: true

},

“parser”:{

“type”: “avro_hadoop”,

“parseSpec”: {

“format”: “avro”,

“timestampSpec”: {

“column”: “date_key”,

“format”: “auto”

},

“columns”: [

“dimension1”,

“dimension2”,

“dimension3”,

“my_summable_col”

],

“dimensionsSpec”: {

“dimensions”: [

“dimension1”,

             "dimension2",

             "dimension3"

]

}

}

},

“metricsSpec”: [

{

“type”: “hyperUnique”,

“name”: “mycol1”,

“fieldName”: “dimension1”

},

{

“type”: “longSum”,

“name”: “mycol2”,

“fieldName”: “my_summable_col”

}

]

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec”: {

“type”: “hashed”,

“targetPartitionSize”: 5000000

},

“jobProperties”: {

    "mapreduce.job.classloader": "true",

    "mapreduce.map.memory.mb" : "8192",

    "mapreduce.reduce.memory.mb" : "18288",

    "mapreduce.input.fileinputformat.split.minsize" : "125829120",

    "mapreduce.input.fileinputformat.split.maxsize" : "268435456",

“mapreduce.map.java.opts”: “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.reduce.java.opts”: “-Duser.timezone=UTC -Dfile.encoding=UTF-8”

}

}

}

}

This means that the indexing spec is not valid JSON and can’t be parsed. You can try feeding it into https://jsonlint.com/ or similar for JSON validation.

Bt my csv file doesn’t contain timestamp.instead it contains id so what would I do to upload csv file with json query

But i don’t have column named as timestamp in my csv data it contains id. So what should i do to write a query.

I just Entered the query as u had posted then too its giving me same result…

Error 500

HTTP ERROR: 500

Problem accessing /druid/indexer/v1/task. Reason:

    java.lang.NullPointerException

Powered by Jetty:// 9.3.19.v20170502

what should i do please help me in writing a query.Below is my query that i had made changes as per your instructions…

{

“type”: “index_hadoop”,

“spec”:{

“ioConfig”: {

“type” : “hadoop”,

“inputSpec” : {

“type” : “static”,

“inputFormat”: “io.druid.data.input.avro.AvroValueInputFormat”,

“paths” : “home/dwtest2/druid-0.12.2/quickstart”

}

},

“dataSchema”: {

“dataSource”: “newcarpurchaseinquiries_fullarchiveon28032018”,

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.7.3”, “org.apache.hadoop:hadoop-aws:2.7.3”],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “day”,

“queryGranularity”: “day”,

“intervals”: [

“2018-10-01/2018-10-02”

],

“rollup”: true

},

“parser”:{

“type”: “avro_hadoop”,

“parseSpec”: {

“format”: “avro”,

“timestampSpec”: {

“column”: “ReqDateTimeDatePart”,

“format”: “auto”,

},

“columns”: [“ReqDateTimeDatePart”,“id”,“CustomerId”,“CarVersionId”,“Color”,“NoOfCars”,“BuyTime”,“Comments”,“RequestDateTime”,“IsApproved”,“IsFake”,"Stat$

“dimensionsSpec”: {

“dimensions”: [“id”,“CustomerId”,“CarVersionId”,“Color”,“NoOfCars”,“BuyTime”,“Comments”,“RequestDateTime”,“IsApproved”,“IsFake”,“StatusId”,"IsForward$

}

}

},

“metricsSpec”: [

{

“type”: “hyperUnique”,

“name”: “mycol1”,

“fieldName”: “ReqDateTimeDatePart”

},

{

“type”: “longSum”,

“name”: “mycol2”,

“fieldName”: “my_summable_col”

}

]

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec”: {

“type”: “hashed”,

“targetPartitionSize”: 5000000

},

“jobProperties”: {

“mapreduce.job.classloader”: “true”,

“mapreduce.map.memory.mb” : “8192”,

“mapreduce.reduce.memory.mb” : “18288”,

“mapreduce.input.fileinputformat.split.minsize” : “125829120”,

“mapreduce.input.fileinputformat.split.maxsize” : “268435456”,

“mapreduce.map.java.opts”: “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.reduce.java.opts”: “-Duser.timezone=UTC -Dfile.encoding=UTF-8”

}

}

}

}