HTTP ERROR: 500 on running index task

Hi,

I am trying to run Druid-0.9.0 for a POC.

My issue is getting a HTTP error : 500 when i try to run the index task.

Mu Ingestion Spec file is:

{

“type” : “index”,

“spec” : {

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir” : “/home/centos/”

“filter” : “abc.csv”

}

},

“dataSchema” : {

“dataSource” : “abc”,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“intervals” : [“2000-01-01/2000-01-02”]

},

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “csv”,

“columns” : [

“timestamp”,

“household_id”,

“network_group_id”,

“quarter_hour_of_the_day_offset”,

“broadcast_month_id”,

“ad_zone”,

“region”,

“region_all”,

“party”,

“party_all”,

“duration”,

“ethnic_group”,

“age_range”,

“income_range”,

“gender”

]

“dimensionsSpec” : {

“dimensions” : [

“household_id”,

“network_group_id”,

“quarter_hour_of_the_day_offset”,

“broadcast_month_id”,

“ad_zone”,

“region”,

“region_all”,

“party”,

“party_all”,

“duration”,

“ethnic_group”,

“age_range”,

“income_range”,

“gender”

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “time”

}

}

},

“metricsSpec” : [

{

“name” : “household_id”,

“type” : “count”

},

{

“name” : “duration”,

“type” : “longSum”,

“fieldName” : “duration”

}

]

},

“tuningConfig” : {

“type” : “index”,

“targetPartitionSize” : 0

“rowFlushBoundary”: 0

}

}

}

The error i am getting:

Warning: Couldn’t read data from file “abc.json”, this makes an empty

Warning: POST.

Error 500

HTTP ERROR: 500

Problem accessing /druid/indexer/v1/task. Reason:

    java.lang.NullPointerException: task

Powered by Jetty://

I am stuck at this and would really appreciate some help.

Thanks.

Hi Vikas,

Something is wrong in your HTTP request to the overlord. If you are using curl, your ‘-d @{fileName}’ path is likely incorrect.

Hi David,

Thanks for the reply.

I am using curl to POST. My curl command is:

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @abc.json ...:8090/druid/indexer/v1/task

Is there something wrong with this command ?

Thanks,

Vikas

That command looks good to me. Are you running it from the same directory where abc.json lives? Another reason you may be unable to read the file is insufficient permissions.

No i wasn’t. When i ran the command from the same directory, i got this error msg:

{“error”:“Instantiation of [simple type, class io.druid.indexing.common.task.IndexTask] value failed: null”}

Okay that’s better. That error means that your JSON couldn’t be deserialized into an IndexTask, in this case because it’s not proper JSON. You’re missing a bunch of commas in there:

After: “/home/centos/” in:

  "firehose" : {
    "type" : "local",
    "baseDir" : "/home/centos/"
    "filter" : "abc.csv"
  }

After: “]” in:

       "age_range",
       "income_range",
       "gender"
              ]
      "dimensionsSpec" : {
        "dimensions" : [

After ‘“targetPartitionSize” : 0’ in:

"tuningConfig" : {
  "type" : "index",
  "targetPartitionSize" : 0
  "rowFlushBoundary": 0
  }

Also your tuningConfig settings don’t look valid to me. Take a look at the documentation here: http://druid.io/docs/latest/ingestion/tasks.html for ideas about what might be more reasonable values.

Thanks a lot man. that solved the issue.

Now i am getting “com.metamx.common.parsers.ParseException: Unparseable timestamp found!”

I have tagged all rows with a fixed timestamp as stated on the tutorial page since my data does not come with timestamp of their own.

A line in my data looks like this:

2000-01-01T00:00:00.000Z ,123,123,123,123,aa,aa,aa,aa,aa,123,aa,123+,123-456,aa

The Logs are as follows:

n] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
2016-06-29T18:37:19,025 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2016-06-29T18:37:19,028 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
2016-06-29T18:37:19,109 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/druid-0.9.0/abc_1.csv]
2016-06-29T18:37:19,117 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_abc_2016-06-29T18:37:14.940Z, type=index, dataSource=abc}]
com.metamx.common.parsers.ParseException: Unparseable timestamp found!
	at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:72) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.StringInputRowParser.parseMap(StringInputRowParser.java:136) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.indexing.common.task.IndexTask.getDataIntervals(IndexTask.java:244) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:200) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0.jar:0.9.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: java.lang.NullPointerException: Null timestamp in input: {timestamp=2000-01-01T00:00:00.000Z , abc_id=2114451, xyz_id=482, quarter_hour_of_th...
	at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:63) ~[druid-api-0.3.16.jar:0.3.16]
	... 11 more
2016-06-29T18:37:19,128 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_abc_2016-06-29T18:37:14.940Z",
  "status" : "FAILED",
  "duration" : 421

Hold on, i found the issue. the column name and timestampSpec were different. fixed it. now the task is showing as “running” status on the coordinator console.

will update once i get a result.

ok, so now the task has been running for around 30 mins.

The data i am ingesting is a 10 Gigs file.

what is the average time taken by Druid to ingest this amount of data ?

Hi Vikas, it appears you are running the local index task, which is not recommended for any file size greater than 1G as the performance can be slow.

You can use a remote Hadoop cluster such as EMR to do the ingestion. That should significantly improve ingestion times.