Batch Data Ingestion Failing in CLuster Mode

Initially, I did a single instance setup using the quickstart guide. It is working fine with batch data load.

Later, I have created druid cluster by following http://druid.io/docs/latest/tutorials/cluster.html

All server logs are looking good.

Coordinator Console is also opening.

While doing a batch ingestion, the job with below exception.

2018-08-09T10:51:03,740 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_trip_2018-08-09T10:50:53.428Z, type=index_hadoop, dataSource=trip}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.jar:0.12.1]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:238) ~[druid-indexing-service-0.12.1.jar:0.12.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.jar:0.12.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.jar:0.12.1]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]

… 7 more

Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:391) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]

at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.1.jar:0.12.1]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]

… 7 more

2018-08-09T10:51:03,746 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_trip_2018-08-09T10:50:53.428Z] status changed to [FAILED].

2018-08-09T10:51:03,748 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_hadoop_trip_2018-08-09T10:50:53.428Z”,

“status” : “FAILED”,

“duration” : 6184

}

``

Also, attached the full log file for the same. I have tried out with a relative as well as an absolute path but no luck.

log10 (179 KB)

Hello Onkar,

It looks like the reduce task is failing while attempting to create a temporary file. Could you verify if the temporary directory path defined by the property: java.io.tmpdir is a valid directory with required read/write permissions?

Thanks,

Atul

Hello Atul,

Temporary directory defined like -Djava.io.tmpdir=var/**tmp **and it has proper read-write permissions.

Thanks,

Onkar

I set up a cluster using imply.io. Now the indexer job is successful but getting below exception at the historical node.

2018-08-09T17:27:59,790 INFO [ZkCoordinator] io.druid.server.coordination.ZkCoordinator - New request[LOAD: wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z] with zNode[/druid/loadQueue/172.31.23.178:8083/wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z].

2018-08-09T17:27:59,790 INFO [ZkCoordinator] io.druid.server.coordination.SegmentLoadDropHandler - Loading segment wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z

2018-08-09T17:27:59,790 INFO [ZkCoordinator] io.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://ft-testanalytics/druid/segments/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z/2018-08-09T17:21:13.174Z/0/index.zip] to outDir[var/druid/segment-cache/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z/2018-08-09T17:21:13.174Z/0]

2018-08-09T17:27:59,843 WARN [ZkCoordinator] io.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z]

2018-08-09T17:27:59,843 INFO [ZkCoordinator] io.druid.server.SegmentManager - Told to delete a queryable for a dataSource[wikipedia] that doesn’t exist.

2018-08-09T17:27:59,843 INFO [ZkCoordinator] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z/2018-08-09T17:21:13.174Z/0]

2018-08-09T17:27:59,844 INFO [ZkCoordinator] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z/2018-08-09T17:21:13.174Z]

2018-08-09T17:27:59,844 INFO [ZkCoordinator] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z]

2018-08-09T17:27:59,845 INFO [ZkCoordinator] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/wikipedia]

2018-08-09T17:27:59,845 WARN [ZkCoordinator] io.druid.server.coordination.SegmentLoadDropHandler - Unable to delete segmentInfoCacheFile[var/druid/segment-cache/info_dir/wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z]

2018-08-09T17:27:59,845 ERROR [ZkCoordinator] io.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=io.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z], segment=DataSegment{size=5778190, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=, dimensions=[isRobot, diffUrl, added, flags, delta, channel, isUnpatrolled, isNew, deltaBucket, isMinor, isAnonymous, deleted, namespace, comment, commentLength, page, user, countryIsoCode, regionName, cityName, countryName, regionIsoCode, metroCode], version=‘2018-08-09T17:21:13.174Z’, loadSpec={type=>s3_zip, bucket=>ft-testanalytics, key=>druid/segments/wikipedia/2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z/2018-08-09T17:21:13.174Z/0/index.zip, S3Schema=>s3n}, interval=2016-06-27T00:00:00.000Z/2016-06-28T00:00:00.000Z, dataSource=‘wikipedia’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z]

    at io.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:265) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:307) [druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:47) [druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:118) [druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:520) [curator-recipes-4.0.0.jar:4.0.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:514) [curator-recipes-4.0.0.jar:4.0.0]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.0.0.jar:4.0.0]

    at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:296) [curator-client-4.0.0.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-4.0.0.jar:4.0.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:512) [curator-recipes-4.0.0.jar:4.0.0]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-4.0.0.jar:4.0.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:771) [curator-recipes-4.0.0.jar:4.0.0]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]

    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

Caused by: java.lang.RuntimeException: org.jets3t.service.ServiceException: Request Error. – ResponseCode: 400, ResponseStatus: Bad Request, RequestId: 94A74046632C9859, HostId: Hqu70LchpTt8gpF/R/LGjKhV7I0w5WMuPwM+Wz1JHQbRQJlCnNKLjf1j/CTMODI+gZKrV8QL8oY=

    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

    at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:327) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:176) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:60) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:205) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:193) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:151) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:133) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:108) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.getAdapter(SegmentManager.java:196) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.loadSegment(SegmentManager.java:157) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:261) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    ... 18 more

Caused by: org.jets3t.service.ServiceException: Request Error.

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:426) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:575) ~[jets3t-0.9.4.jar:0.9.4]

    at io.druid.storage.s3.S3Utils.isObjectInBucket(S3Utils.java:96) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:318) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:314) ~[?:?]

    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:89) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:312) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:176) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:60) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:205) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:193) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:151) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:133) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:108) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.getAdapter(SegmentManager.java:196) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.loadSegment(SegmentManager.java:157) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:261) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    ... 18 more

Caused by: org.jets3t.service.impl.rest.HttpException: 400 Bad Request

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:575) ~[jets3t-0.9.4.jar:0.9.4]

    at io.druid.storage.s3.S3Utils.isObjectInBucket(S3Utils.java:96) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:318) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller$4.call(S3DataSegmentPuller.java:314) ~[?:?]

    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:89) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller.isObjectInBucket(S3DataSegmentPuller.java:312) ~[?:?]

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:176) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:60) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:205) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:193) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:151) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:133) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:108) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.getAdapter(SegmentManager.java:196) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.SegmentManager.loadSegment(SegmentManager.java:157) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    at io.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:261) ~[druid-server-0.12.1-iap10.jar:0.12.1-iap10]

    ... 18 more

2018-08-09T17:27:59,855 INFO [ZkCoordinator] io.druid.server.coordination.ZkCoordinator - Completed request [LOAD: wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z]

2018-08-09T17:27:59,855 INFO [ZkCoordinator] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/172.31.23.178:8083/wikipedia_2016-06-27T00:00:00.000Z_2016-06-28T00:00:00.000Z_2018-08-09T17:21:13.174Z] was removed

``

It looks like the historical is unable to pull the segment from S3. One thing you could verify is the S3 endpoint and explicitly set the endpoint for your AWS region in the jets3t properties file.