S3 Parquet File Indexing Failure - Timestamp Issue

Druid Version: 0.12.1

Hi,

Im trying to load batch data into druid from S3 Parquet file using Hadoop indexing.

This is the task config https://gist.github.com/shaharck/7885e8f8066e38f1d0963618fe2e2207#file-druidtask-json

This is the failure i see:

java.lang.Exception: io.druid.java.util.common.RE: Failure on row[{“type”: “EVENT”}]
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.7.3.jar:?]
Caused by: io.druid.java.util.common.RE: Failure on row[{“type”: “EVENT”}]
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:93) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:283) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: java.lang.NullPointerException
at io.druid.data.input.MapBasedRow.getTimestampFromEpoch(MapBasedRow.java:60) ~[druid-api-0.12.1.jar:0.12.1]
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:86) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:283) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]

Looking at previous related post i saw some timezone related answers but all of my java configs are with -Duser.timezone=UTC -Dfile.encoding=UTF-8

Any suggestion?

Thanks,

Shahar

Worth mentioning i’ve checked that the data itself is formatted the same way:

Update:
Seems like i should have used the following format

yyyy-MM-dd’T’HH:mm:ss.SSS

Unfortunately i see that Hive timestamp type (INT96) is not supported when converted to avro