Druid 0.12 - Problem with Hadoop batch ingestion

Hello,

I am having a problem with Hadoop batch ingestion.

Here’s my ingestion spec:

{
“type”:“index_hadoop”,
“spec”:{
“dataSchema”:{
“dataSource”:“wecpg_order_data_mart_all”,
“parser”:{
“type”:“hadoopyString”,
“parseSpec”:{
“format”:“json”,
“timestampSpec”:{
“column”:“order_date”,
“format”:“auto”
},
“dimesionsSpec”:{
“dimensions”:[
{
“type”:“float”,
“name”:“amount”
},
{
“type”:“string”,
“name”:“customer_id”
},
“customer_name”,
“name”,
{
“type”:“string”,
“name”:“order_id”
},
“source”,
“noncv”,
“channel”
],
“dimensionExlusions”:[

],
“spatialDimensions”:[

]
}
}
},
“metricsSpec”:[
{
“type”:“count”,
“name”:“count”
}
],
“granularitySpec”:{
“type”:“uniform”,
“segmentGranularity”:“DAY”,
“queryGranularity”:“NONE”,
“rollup”:false,
“intervals”:[

]
}
},
“ioConfig”:{
“type”:“hadoop”,
“inputSpec”:{
“type”:“static”,
“paths”: “/user/spark/wecpg_team/order_data_mart_all_df/”
}
},
“tuningConfig”:{
“type”:“hadoop”,
“partitionsSpec”:{
“type”:“dimension”,
“targetPartitionSize”:5000000
}
}
}
}

Here’s the error log:

2020-08-05T05:08:32,117 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wecpg_order_data_mart_all_2020-08-05T05:08:16.930Z, type=index_hadoop, dataSource=wecpg_order_data_mart_all}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:241) ~[guava-28.0-jre.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:445) [druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:417) [druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) [guava-28.0-jre.jar:?]
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) [guava-28.0-jre.jar:?]
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) [guava-28.0-jre.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
… 9 more
Caused by: java.lang.IllegalStateException: Optional.get() cannot be called on an absent value
at com.google.common.base.Absent.get(Absent.java:43) ~[guava-28.0-jre.jar:?]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:63) ~[druid-indexing-hadoop-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:325) ~[druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.1.4.0-315.jar:0.12.1.3.1.4.0-315]
… 9 more
2020-08-05T05:08:32,124 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wecpg_order_data_mart_all_2020-08-05T05:08:16.930Z] status changed to [FAILED].
2020-08-05T05:08:32,127 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_hadoop_wecpg_order_data_mart_all_2020-08-05T05:08:16.930Z”,
“status” : “FAILED”,
“duration” : 5147
}

Any help is greatly appreciated.

Have you been able to ingest Hadoop data at all? If not, you may want to check the hadoop logs as well. What version of Druid and have you made sure that library versions are the same between Druid and Hadoop?

Thank you, Rachel. The Druid 0.12 is part of HDP 3.1.4. I guess the libraries are supposed to work well together.

After numerous revisions, here’s the working ingestion spec:
{
“type”:“index_hadoop”,
“spec”:{
“ioConfig”:{
“type”:“hadoop”,
“inputSpec”:{
“type”:“static”,
“paths”:"/user/spark/wecpg_team/order_data_mart_all_df/"
}
},
“dataSchema”:{
“dataSource”:“wecpg_order_data_mart_all”,
“granularitySpec”:{
“type”:“uniform”,
“segmentGranularity”:“MONTH”,
“queryGranularity”:“NONE”
},
“parser”:{
“type”:“hadoopyString”,
“parseSpec”:{
“format”:“json”,
“dimensionsSpec”:{
“dimensions”:[
“source”,
{
“type”:“string”,
“name”:“order_id”
},
{
“type”:“string”,
“name”:“customer_id”
},
“customer_name”,
{
“type”:“float”,
“name”:“amount”
},
“name”,
“noncv”,
“channel”
]
},
“timestampSpec”:{
“format”:“auto”,
“column”:“order_date”
}
}
},
“metricsSpec”:[
{
“type”:“count”,
“name”:“count”
}
]
},
“tuningConfig”:{
“type”:“hadoop”,
“partitionsSpec”:{
“type”:“hashed”,
“targetPartitionSize”:5000000
},
“jobProperties”:{

}
}
}
}

I fixed the typo dimesionsSpec and changed segmentGranularity from DAY to MONTH.

Thank you everyone for your time.