NullTimeStamp in input: <even though timestamp is there>

Hi,

I get the below error during hadoop based batch ingestion:

at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:93) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:283) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: io.druid.java.util.common.parsers.ParseException: Unparseable timestamp found!
	at io.druid.data.input.impl.MapInputRowParser.parseBatch(MapInputRowParser.java:75) ~[druid-api-0.12.2.jar:0.12.2]
	at io.druid.data.input.avro.AvroParsers.parseGenericRecord(AvroParsers.java:61) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.data.input.AvroHadoopInputRowParser.parseBatch(AvroHadoopInputRowParser.java:51) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.data.input.AvroHadoopInputRowParser.parseBatch(AvroHadoopInputRowParser.java:31) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.segment.transform.TransformingInputRowParser.parseBatch(TransformingInputRowParser.java:50) ~[druid-processing-0.12.2.jar:0.12.2]
	at io.druid.indexer.HadoopDruidIndexerMapper.parseInputRow(HadoopDruidIndexerMapper.java:110) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:68) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:283) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: java.lang.NullPointerException: Null timestamp in input: {sub_category=subcat1, **date=1539216000,** item_name=item1, region_name=region1, region_nbr=1, sales_am...
	at io.druid.data.input.impl.MapInputRowParser.parseBatch(MapInputRowParser.java:67) ~[druid-api-0.12.2.jar:0.12.2]
	at io.druid.data.input.avro.AvroParsers.parseGenericRecord(AvroParsers.java:61) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.data.input.AvroHadoopInputRowParser.parseBatch(AvroHadoopInputRowParser.java:51) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.data.input.AvroHadoopInputRowParser.parseBatch(AvroHadoopInputRowParser.java:31) ~[druid-avro-extensions-0.12.2.jar:0.12.2]
	at io.druid.segment.transform.TransformingInputRowParser.parseBatch(TransformingInputRowParser.java:50) ~[druid-processing-0.12.2.jar:0.12.2]
	at io.druid.indexer.HadoopDruidIndexerMapper.parseInputRow(HadoopDruidIndexerMapper.java:110) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:68) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:283) ~[druid-indexing-hadoop-0.12.2.jar:0.12.2]
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]

As can be seen, timestamp is present in the field. What could be causing this error?

Could this be because of wrong JVM configs? Kinda reached a dead end on this after couple of hours. Appreciate any help.

Note: All nodes are in UTC timezone and in indexing task time format is "posix" and column is "date".

Cheers,
Ankit

Hey Ankit,

All I can guess is that your timestampSpec is not getting picked up properly. Maybe it’s specified at the wrong level of the spec JSON. You could double check that by looking at the task log: it will print back the spec to you, as it was parsed, and you can double-check that it prints back the timestampSpec you expect. If not, then you should adjust the spec to fix it.

Hey Gian,

Thanks for the response. I will check that.

However, is there any way I would get this error because of something wrong in JVM config for all the nodes? Just want to make sure I have that part covered!

I don’t think so. That might make your timestamps get interpreted in the wrong timezone, but it shouldn’t make the timestamp be treated as null.

I checked my indexing log and it seems timestampSpec is being picked up correctly:

“timestampSpec” : {

        "format" : "posix",

        "column" : "date"

      }

Could there be any issue with using the UNIX timestamp here? I checked and all my nodes are running in UTC timezone.