Problem with batch ingestion - indexing service


I am trying the batch ingestion example

I have the cluster setup. Started the indexing service with:

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/overlord:lib/*:/opt/mapr/conf io.druid.cli.Main server overlord

Submitted the index task like:

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @examples/indexing/wikipedia_index_task.json localhost:8090/druid/indexer/v1/task

I am seeing the following error in the console for the indexing service

java.lang.NullPointerException: task

at ~[guava-16.0.1.jar:?]

at io.druid.indexing.overlord.TaskQueue.add( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.http.OverlordResource$1.apply( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.http.OverlordResource$1.apply( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.http.OverlordResource.asLeaderWith( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.http.OverlordResource.taskPost( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

I am guessing its a dependency issue?

Can someone tell me what is the issue?

This error is saying the task is null. Which directory are you running the ingest command and what does your task look like?

I am running it from the druid home directory and I am using the sample wikipedia indexing task json file provided.
What about this line - at ~[guava-16.0.1.jar:?]

Does it mean it was not able to download the guava jar file? I am behind a firewall and the application won’t be able to get to maven repo.

Hi Amol,

I’m fairly certain this has nothing to do with dependency or firewall issues. The exception you’re seeing happens when you submit a POST indexing request with an empty body. Can you double-check the path to the JSON file in your curl command and make sure you’re at the right relative directory and that wikipedia_index_task.json exists?

Hi David/Fangjin,

You were right. It could not find the task file. I resolved that. However there’s another error now.

My data file contains the following:

{“timestamp”: “2015-12-09T19:42:33Z”, “page”: “Gypsy Danger”, “language” : “en”, “user” : “nuclear”, “unpatrolled” : “true”, “newPage” : “true”, “robot”: “false”, “anonymous”: “false”, “namespace”:“article”, “continent”:“North America”, “country”:“United States”, “region”:“Bay Area”, “city”:“San Francisco”, “added”: 57, “deleted”: 200, “delta”: -143}

{“timestamp”: “2015-12-09T19:42:45Z”, “page”: “Striker Eureka”, “language” : “en”, “user” : “speed”, “unpatrolled” : “false”, “newPage” : “true”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Australia”, “country”:“Australia”, “region”:“Cantebury”, “city”:“Syndey”, “added”: 459, “deleted”: 129, “delta”: 330}

{“timestamp”: “2015-12-09T19:41:21Z”, “page”: “Cherno Alpha”, “language” : “ru”, “user” : “masterYi”, “unpatrolled” : “false”, “newPage” : “true”, “robot”: “true”, “anonymous”: “false”, “namespace”:“article”, “continent”:“Asia”, “country”:“Russia”, “region”:“Oblast”, “city”:“Moscow”, “added”: 123, “deleted”: 12, “delta”: 111}

{“timestamp”: “2015-12-09T19:48:39Z”, “page”: “Crimson Typhoon”, “language” : “zh”, “user” : “triplets”, “unpatrolled” : “true”, “newPage” : “false”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Asia”, “country”:“China”, “region”:“Shanxi”, “city”:“Taiyuan”, “added”: 905, “deleted”: 5, “delta”: 900}

{“timestamp”: “2015-12-09T19:41:27Z”, “page”: “Coyote Tango”, “language” : “ja”, “user” : “cancer”, “unpatrolled” : “true”, “newPage” : “false”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Asia”, “country”:“Japan”, “region”:“Kanto”, “city”:“Tokyo”, “added”: 1, “deleted”: 10, “delta”: -9}

I have attached my task file.

Here is the error I am getting in the task logs:

2015-12-09T19:51:57,599 DEBUG [HttpClient-Netty-Worker-1] com.metamx.http.client.NettyHttpClient - [POST http://******:8090/druid/indexer/v1/action] Got response: 200 OK

2015-12-09T19:51:57,599 DEBUG [HttpClient-Netty-Worker-1] com.metamx.http.client.NettyHttpClient - [POST http://******:8090/druid/indexer/v1/action] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@5ebeda8

2015-12-09T19:51:57,600 DEBUG [HttpClient-Netty-Worker-1] com.metamx.http.client.NettyHttpClient - [POST http://******:8090/druid/indexer/v1/action] Got chunk: 184B, last=false

2015-12-09T19:51:57,600 DEBUG [HttpClient-Netty-Worker-1] com.metamx.http.client.NettyHttpClient - [POST http://******:8090/druid/indexer/v1/action] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@2b83853b

2015-12-09T19:51:57,600 DEBUG [HttpClient-Netty-Worker-1] com.metamx.http.client.NettyHttpClient - [POST http://******:8090/druid/indexer/v1/action] Got chunk: 0B, last=true

2015-12-09T19:51:57,603 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [wikipedia_data.json] in and beneath [/idn/home/apuro3]

2015-12-09T19:51:57,633 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/idn/home/apuro3/wikipedia_data.json]

2015-12-09T19:51:57,651 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_pedia_2015-12-09T19:51:49.068Z, type=index, dataSource=pedia}]

com.metamx.common.parsers.ParseException: Unable to parse row

at com.metamx.common.parsers.JSONParser.parse( ~[java-util-0.27.0.jar:?]

at ~[druid-api-0.3.9.jar:0.3.9]

at ~[druid-api-0.3.9.jar:0.3.9]

at ~[druid-api-0.3.9.jar:0.3.9]

at io.druid.indexing.common.task.IndexTask.getDataIntervals( ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at ~[druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ [druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ [druid-indexing-service-0.8.1-iap2.jar:0.8.1-iap2]

at java.util.concurrent.FutureTask$Sync.innerRun( [?:1.7.0_05]

at [?:1.7.0_05]

at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:1.7.0_05]

at java.util.concurrent.ThreadPoolExecutor$ [?:1.7.0_05]

at [?:1.7.0_05]

Caused by: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input

at [Source: ; line: 1, column: 1]

at com.fasterxml.jackson.databind.JsonMappingException.from( ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper._initForReading( ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose( ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper.readTree( ~[jackson-databind-2.4.4.jar:2.4.4]

at com.metamx.common.parsers.JSONParser.parse( ~[java-util-0.27.0.jar:?]

… 12 more

2015-12-09T19:51:57,662 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_pedia_2015-12-09T19:51:49.068Z”,

“status” : “FAILED”,

“duration” : 73


Can you tell me what could be the problem? The same messages seem to be parsed when ingested via kafka.

wikipedia_index_task.json (1.83 KB)

You most likely have blank lines in your data file. Remove them and try again.

Yes. That was it. I had an empty line at the end of the file. Wish it had better error reporting.