Druid batch index job failed, the directory name is confusing

I am learning druid. I test some data as describe in the ‘quick start’ guide. it run successfully.
Then I try to set up a cluster. use ‘hdfs’ instead of ‘local’. then I encountered some problem.

After the task posted and the mapreduce job running a while. it report an error.

here is some of my task log:

2017-06-13T09:49:49,974 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T000000.000Z_20170609T010000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,975 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T010000.000Z_20170609T020000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,977 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T020000.000Z_20170609T030000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,978 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T030000.000Z_20170609T040000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,979 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T040000.000Z_20170609T050000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,981 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T050000.000Z_20170609T060000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,982 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T060000.000Z_20170609T070000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,983 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T070000.000Z_20170609T080000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:49,983 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T080000.000Z_20170609T090000.000Z/partitions.json] didn't exist!?
2017-06-13T09:49:50,362 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.HadoopIndexTask - Starting a hadoop index generator job...
2017-06-13T09:49:50,410 INFO [task-runner-0-priority-0] io.druid.indexer.path.StaticPathSpec - Adding paths[hdfs://ci-hdfs/tmp/druid/cidata/cidata-sampled.json]
2017-06-13T09:49:50,428 INFO [task-runner-0-priority-0] io.druid.indexer.HadoopDruidIndexerJob - No metadataStorageUpdaterJob set in the config. This is cool if you are running a hadoop index task, otherwise nothing will be uploaded to database.
2017-06-13T09:49:50,464 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_cidata2_2017-06-13T09:42:13.880Z, type=index_hadoop, dataSource=cidata2}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:211) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:223) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_102]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_102]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	... 7 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: No buckets?? seems there is no data to index.
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:215) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:276) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_102]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	... 7 more
Caused by: java.lang.RuntimeException: No buckets?? seems there is no data to index.
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:176) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:276) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_102]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_102]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_102]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]
	... 7 more
2017-06-13T09:49:50,477 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_cidata2_2017-06-13T09:42:13.880Z] status changed to [FAILED].
2017-06-13T09:49:50,483 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_cidata2_2017-06-13T09:42:13.880Z",
  "status" : "FAILED",
  "duration" : 448091 

}

it say can find file******/tmp/druid-indexing/cidata2/2017-06-13T094213.880Z_147593496b30438b8dd9c56cd15bb879/20170609T000000.000Z_20170609T010000.000Z/partitions.json**. but at the mapping stage I actually see a lot of partitions.json files lie on my hdfs directory. the only difference is: the file path prefix is /tmp/druid-indexing/cidata2/2017-06-13T100521.806Z_343e6f14699442bfaa8a7bcf383e3486. that’s strange

the hadoop cluster is setup by cdh 5.10.0. hadoop version 2.6.0, so the hadoop-client version used with my task is:org.apache.hadoop:hadoop-client:2.6.0-mr1-cdh5.10.0

why the directory name created and read is NOT the same ? seems the hash part is changed …

log.txt (314 KB)

cidata-index.json (1.62 KB)

I solved the problem, it’s caused by the wrong timezone

https://groups.google.com/forum/#!topic/druid-development/tJ_PiwGdE10

在 2017年6月13日星期二 UTC+8下午6:51:06,canghaiz写道: