Indexing task (Hadoop Job) running very slow

Hi,
I am using a multi node druid cluster and a EMR cluster (remote hadoop cluster) with EMR config (1 m4.xl + 2 r3.xl).
I am trying to load a very small file for testing purpose: 125kb containing 1000 lines of data.
The job takes forever to load.

I am facing two problems here:

  1. The data ingestion task takes more than an hour for very small dataset.

  2. After completion of Hadoop index task, the index.zip file is created in S3 but it is not being downloaded to local segment-cache (I did it manually once, so data is working)

Error logs:

2017-01-11T22:09:27,132 INFO [ZkCoordinator-0] io.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://xxxxxxxxx/druid/segments/minitable/2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z/2017-01-11T20:44:21.425Z/0/index.zip] to outDir[var/druid/segment-cache/minitable/2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z/2017-01-11T20:44:21.425Z/0]

2017-01-11T22:09:27,508 INFO [ZkCoordinator-0] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Asked to cleanup something[DataSegment{size=77528, shardSpec=NoneShardSpec, metrics=[live_styles, non_live_styles, broken_style, new_season_styles, live_styles_qty, broken_style_qty, new_season_styles_qty], dimensions=[article_type, brand, gender, brand_type, master_category, supply_type, business_unit, testdim, week, month, year, style_id], version=‘2017-01-11T20:44:21.425Z’, loadSpec={type=s3_zip, bucket=xxxxxxxx, key=druid/segments/minitable/2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z/2017-01-11T20:44:21.425Z/0/index.zip}, interval=2016-01-01T00:00:00.000Z/2016-01-02T00:00:00.000Z, dataSource=‘minitable’, binaryVersion=‘9’}] that didn’t exist. Skipping.

2017-01-11T22:09:27,509 WARN [ZkCoordinator-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[minitable_2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z_2017-01-11T20:44:21.425Z]

2017-01-11T22:09:27,509 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completely removing [minitable_2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z_2017-01-11T20:44:21.425Z] in [30,000] millis

2017-01-11T22:09:27,515 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [LOAD: minitable_2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z_2017-01-11T20:44:21.425Z]

2017-01-11T22:09:27,518 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[minitable_2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z_2017-01-11T20:44:21.425Z], segment=DataSegment{size=77528, shardSpec=NoneShardSpec, metrics=[live_styles, non_live_styles, broken_style, new_season_styles, live_styles_qty, broken_style_qty, new_season_styles_qty], dimensions=[article_type, brand, gender, brand_type, master_category, supply_type, business_unit, testdim, week, month, year, style_id], version=‘2017-01-11T20:44:21.425Z’, loadSpec={type=s3_zip, bucket=xxxxxxxxx, key=druid/segments/minitable/2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z/2017-01-11T20:44:21.425Z/0/index.zip}, interval=2016-01-01T00:00:00.000Z/2016-01-02T00:00:00.000Z, dataSource=‘minitable’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[minitable_2016-01-01T00:00:00.000Z_2016-01-02T00:00:00.000Z_2017-01-11T20:44:21.425Z]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:310) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:351) [druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:153) [druid-server-0.9.2.jar:0.9.2]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.11.0.jar:?]

    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:513) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.11.0.jar:?]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_121]

    at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_121]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_121]

    at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_121]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_121]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_121]

    at java.lang.Thread.run(Thread.java:745) [?:1.7.0_121]

Caused by: io.druid.segment.loading.SegmentLoadingException: No such file or directory

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:238) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.2.jar:0.9.2]

``