Issues with importing local TSV files

Hi,

I am evaluating Druid as it fits perfectly for our use cases (low latency scale out DB). I was able to follow the tutorial to import batch files including the “wikiticker” example.

I am now trying to import our files which are basically TSV files.

Following is my ingestion spec:

{

“type” : “index”,

“spec”: {

“ioConfig” : {

“type”: “index”,

“inputSpec” : {

“type”: “local”,

“paths”: “/home/praveen/druid/data/test”

}

},

“dataSchema”: {

“dataSource” : “local”,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “hour”,

“queryGranularity” : “none”,

“intervals” : [“2016-07-01/2016-07-28”]

},

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “tsv”,

“timestampSpec”: {

“format” : “auto”,

“column”: “created_at”

},

“dimensionsSpec”: {

“dimensions”: [

“id”,

“customer_id”,

“email_list”,

“phone_list”,

“deviceid_list”,

“pardot_id”,

“acquisition_channel”,

“acquisition_timestamp”,

“acquisition_source”,

“created_at”,

“acquisition_campaign”

]

}

}

},

“metricsSpec”: [

]

}

}

}

Following is the exception I see during the import.

[TaskLocation{host='dev', port=8100}].
2017-06-23T20:59:04,978 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_local_2017-06-23T20:59:00.476Z] status changed to [RUNNING].
2017-06-23T20:59:04,978 INFO [main] org.eclipse.jetty.server.Server - jetty-9.3.16.v20170120
2017-06-23T20:59:04,980 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_local_2017-06-23T20:59:00.476Z, type=index, dataSource=local}]
java.lang.NullPointerException: delegate cannot be null
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) ~[guava-16.0.1.jar:?]
	at io.druid.segment.realtime.firehose.ReplayableFirehoseFactory.<init>(ReplayableFirehoseFactory.java:92) ~[druid-server-0.10.0.jar:0.10.0]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:178) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2017-06-23T20:59:04,992 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_local_2017-06-23T20:59:00.476Z] status changed to [FAILED].
2017-06-23T20:59:05,000 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_local_2017-06-23T20:59:00.476Z",
  "status" : "FAILED",
  "duration" : 23
}
2017-06-23T20:59:05,220 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class
2017-06-23T20:59:05,221 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering

I wish there are better examples especially for common tasks such as importing TSV, CSV, etc.

Thanks,
Praveen.

Hi Praveen,

the task spec does not contain “firehose” as part of task IoConfig -
Refer to the sample index task here on how to specify the firehose- http://druid.io/docs/latest/ingestion/tasks.html