Hadoop Indexing Not Working with Azure Deep Storage

Hi,
We are migrating the Druid cluster from AWS to Azure. As part of that, we were running our hadoop indexing job to ingest data into Druid on Azure with Azure Blob Storage as deep storage.

But looks like this is not supported in Druid 0.12 version as well.

Could anyone please confirm about it (as is what all things are restricted when we migrate Druid Cluster from AWS to Azure) ?

Also Why this is not supported , a quick explanation ?

And what are the ways to make things work , as we do want Hadoop indexing job to ingest data .


2018-04-05 14:05:02.081+0000 *INFO* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner map task executor complete.

2018-04-05 14:05:02.082+0000 *WARN* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner job_local1436102658_0001

java.lang.Exception: java.lang.NullPointerException: segmentOutputPath

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.7.3.jar:?]

Caused by: java.lang.NullPointerException: segmentOutputPath

at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) ~[guava-16.0.1.jar:?]

at io.druid.indexer.HadoopDruidIndexerConfig.verify(HadoopDruidIndexerConfig.java:589) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.HadoopDruidIndexerConfig.fromConfiguration(HadoopDruidIndexerConfig.java:211) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:51) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:225) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:280) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_161]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]

2018-04-05 14:05:02.762+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 running in uber mode : false

2018-04-05 14:05:02.763+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job  map 0% reduce 0%

2018-04-05 14:05:02.765+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 failed with state FAILED due to: NA

2018-04-05 14:05:02.773+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Counters: 0

2018-04-05 14:05:02.773+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob Job failed: job_local1436102658_0001

2018-04-05 14:05:02.774+0000 *INFO* CAMP [task-runner-0-priority-0] io.druid.indexer.JobHelper Deleting path[/tmp/druid-indexing/wikiticker/2018-04-05T140453.615Z_c8d08b4bb74141a2ad94d2956b41defc]

2018-04-05 14:05:02.793+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2018-04-05T14:04:53.617Z, type=index_hadoop, dataSource=wikiticker}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.0.jar:0.12.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

Caused by: java.lang.reflect.InvocationTargetException

Thanks,

Pravesh Gupta

Hi Pravesh,

I think you’ll need this patch which was merged after 0.12.0:

https://github.com/druid-io/druid/pull/5221

Thanks,

Jon

Thanks Jonathan.

Any idea on when can we expect next Druid release.

We are dependent on this PR.

Also, is there any rc Druid tar available with PR merged. If not, how can we get the same ?

Thanks,

Pravesh Gupta