Error loading data into druid locally

Hi all,

I am trying to load data into druid locally and i see the error in the stack trace below.

2018-08-09T15:55:34,847 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_test-wikipedia_2018-08-09T15:55:28.789Z] status changed to [RUNNING].
2018-08-09T15:55:34,858 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.IndexTask - Determining intervals and shardSpecs
2018-08-09T15:55:34,882 INFO [main] org.eclipse.jetty.server.Server - jetty-9.3.19.v20170502
2018-08-09T15:55:34,921 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_test-wikipedia_2018-08-09T15:55:28.789Z, type=index, dataSource=test-wikipedia}]
java.lang.NullPointerException
	at io.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:268) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:187) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.1.jar:0.10.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
2018-08-09T15:55:34,955 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_test-wikipedia_2018-08-09T15:55:28.789Z] status changed to [FAILED].
2018-08-09T15:55:34,958 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_test-wikipedia_2018-08-09T15:55:28.789Z",
  "status" : "FAILED",
  "duration" : 110
}

I don't understand what i am missing in the ingestion spec .Please find the attached ingestion spec.

Thanks.

index.json (1.7 KB)

Hello,

It looks like the ioConfig section of your ingestion spec is causing the problem as for index tasks it expects a firehose instead of an inputspec. Within the ioConfig section, could you change the property inputSpec to firehose and see if it works?

You can find more information about ioConfig here: http://druid.io/docs/0.10.1/ingestion/tasks.html#ioconfig

Thanks,

Atul

Hi,

I have updated the input spec to firehose but i still see the same error. I am actually trying to inject data through local files. Is there any property to be changed to do that ?. Please find hte updated spec below

{

“type”: “index”,

“spec”: {

“dataSchema”: {

“dataSource”: “tpch_lineitem_small”,

“parser”: {

“parseSpec”: {

“timestampSpec”: {

“column”: “l_shipdate”,

“format”: “yyyy-MM-dd”

},

“dataSpec”: {

“format”: “tsv”,

“delimiter”: “|”,

“columns”: [

“l_orderkey”,

“l_partkey”,

“l_suppkey”,

“l_linenumber”,

“l_quantity”,

“l_extendedprice”,

“l_discount”,

“l_tax”,

“l_returnflag”,

“l_linestatus”,

“l_shipdate”,

“l_commitdate”,

“l_receiptdate”,

“l_shipinstruct”,

“l_shipmode”,

“l_comment”

],

“dimensions”: [

“l_orderkey”,

“l_partkey”,

“l_suppkey”,

“l_linenumber”,

“l_returnflag”,

“l_linestatus”,

“l_shipdate”,

“l_commitdate”,

“l_receiptdate”,

“l_shipinstruct”,

“l_shipmode”,

“l_comment”

]

},

“granularitySpec”: {

“type”: “arbitrary”,

“intervals”: [

“1980/2020”

]

},

“ioConfig”: {

“type”: “index”,

“firehose”: {

“type”: “local”,

“paths”: “/indexfiles/lineitem.tbl.gz”

}

},

“rollupSpec”: {

“aggs”: [

{

“type”: “count”,

“name”: “count”

},

{

“type”: “longSum”,

“fieldName”: “L_QUANTITY”,

“name”: “L_QUANTITY”

},

{

“type”: “doubleSum”,

“fieldName”: “L_EXTENDEDPRICE”,

“name”: “L_EXTENDEDPRICE”

},

{

“type”: “doubleSum”,

“fieldName”: “L_DISCOUNT”,

“name”: “L_DISCOUNT”

},

{

“type”: “doubleSum”,

“fieldName”: “L_TAX”,

“name”: “L_TAX”

}

],

“rollupGranularity”: “day”

}

}

}

}

}

}

Thanks for responding though.

Thanks !

Hi,

the taskSpec should be fixed like below.

  • ‘ioConfig’ should be at the same level with ‘dataSchema’.

  • ‘rollupSpec’ should be in ‘dataSchema’ and it’s name should be ‘metricsSpec’.

Best,

Jihoon

Got it. Thank you !

Below is the sample spec that was working for me just in case someone needs it

{

“type”: “index”,

“spec”: {

“dataSchema”: {

“dataSource”: “tpch_lineitem_small”,

“parser”: {

“parseSpec”: {

“format”: “tsv”,

“delimiter”: “|”,

“columns”: [

“l_orderkey”,

“l_partkey”,

“l_suppkey”,

“l_linenumber”,

“l_quantity”,

“l_extendedprice”,

“l_discount”,

“l_tax”,

“l_returnflag”,

“l_linestatus”,

“l_shipdate”,

“l_commitdate”,

“l_receiptdate”,

“l_shipinstruct”,

“l_shipmode”,

“l_comment”

],

“timestampSpec”: {

“column”: “l_shipdate”,

“format”: “yyyy-MM-dd”

},

“dimensionsSpec”: {

“dimensions”: [

“l_orderkey”,

“l_partkey”,

“l_suppkey”,

“l_linenumber”,

“l_returnflag”,

“l_linestatus”,

“l_shipdate”,

“l_commitdate”,

“l_receiptdate”,

“l_shipinstruct”,

“l_shipmode”,

“l_comment”

]

}

}

},

“granularitySpec”: {

“type”: “arbitrary”,

“queryGranularity”: “DAY”,

“intervals”: [

“1980/2020”

]

},

“metricsSpec”: [

{

“type”: “count”,

“name”: “count”

},

{

“type”: “longSum”,

“fieldName”: “L_QUANTITY”,

“name”: “L_QUANTITY”

},

{

“type”: “doubleSum”,

“fieldName”: “L_EXTENDEDPRICE”,

“name”: “L_EXTENDEDPRICE”

},

{

“type”: “doubleSum”,

“fieldName”: “L_DISCOUNT”,

“name”: “L_DISCOUNT”

},

{

“type”: “doubleSum”,

“fieldName”: “L_TAX”,

“name”: “L_TAX”

},

{

“type”: “hyperUnique”,

“fieldName”: “L_SHIPMODE”,

“name”: “L_SHIPMODE”

}

]

},

“ioConfig”: {

“type”: “index”,

“firehose”: {

“type”: “local”,

“filter”: “lineitem.tbl.gz”,

“baseDir”: “/druid/current/indexfiles”

}

}

}

}