middle manager cannot hand off realtime data to historical node

Hi ,

I get many exception logs in historical node log like below:

And it seems to be full as each historical node size is at 50% and cannot increase any more in the coornidator view, in origin, each historical node is 10GB each, I changed the config “druid.server.maxSize” from 10000000000 to 20000000000 in historical node runtime properties, but this change seems doesn’t make any difference, each historical node is consumed up to 10GB and cannot increase any bit more.

Currenty, middle manager cannot hand off realtime data to historical node, while tranquility app keep sending messages to middle manager.

Could you help me to figure these out?

2015-09-15 06:06:14,814 ERROR i.d.s.c.ZkCoordinator [ZkCoordinator-0] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[MiddletiersBillingProd_2015-07-31T21:00:00.000Z_2015-07-31T22:00:00.000Z_2015-09-14T12:10:57.917Z], segment=DataSegment{size=9047397, shardSpec=NoneShardSpec, metrics=[count, fee, expose, click, startDownload], dimensions=[platform, adid, incomeLevel, age, province, gender, education, mediaType, dsp, city], version=‘2015-09-14T12:10:57.917Z’, loadSpec={type=hdfs, path=hdfs://lgprc-xiaomi/user/h_miui_ad/druid/storage/chenxuehui/MiddletiersBillingProd/20150731T210000.000Z_20150731T220000.000Z/2015-09-14T12_10_57.917Z/0/index.zip}, interval=2015-07-31T21:00:00.000Z/2015-07-31T22:00:00.000Z, dataSource=‘MiddletiersBillingProd’, binaryVersion=‘9’}}

2015-09-15 06:06:14,814 INFO c.m.e.c.LoggingEmitter [ZkCoordinator-0] Event [{“feed”:“alerts”,“timestamp”:“2015-09-15T06:06:14.814Z”,“service”:“druid/prod/historical”,“host”:“hh-miui-ad-preview01.bj:8751”,“severity”:“component-failure”,“description”:“Failed to load segment for dataSource”,“data”:{“class”:“io.druid.server.coordination.ZkCoordinator”,“exceptionType”:“io.druid.segment.loading.SegmentLoadingException”,“exceptionMessage”:“Exception loading segment[MiddletiersBillingProd_2015-07-31T21:00:00.000Z_2015-07-31T22:00:00.000Z_2015-09-14T12:10:57.917Z]”,“exceptionStackTrace”:“io.druid.segment.loading.SegmentLoadingException: Exception loading segment[MiddletiersBillingProd_2015-07-31T21:00:00.000Z_2015-07-31T22:00:00.000Z_2015-09-14T12:10:57.917Z]\n\tat io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146)\n\tat io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171)\n\tat io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42)\n\tat io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:121)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510)\n\tat org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)\n\tat com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)\n\tat org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:507)\n\tat org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)\n\tat java.lang.Thread.run(Thread.java:722)\nCaused by: com.metamx.common.ISE: Segment[MiddletiersBillingProd_2015-07-31T21:00:00.000Z_2015-07-31T22:00:00.000Z_2015-09-14T12:10:57.917Z:9,047,397] too large for storage[/home/work/app/druid/logs/persistent:112,864].\n\tat io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:116)\n\tat io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93)\n\tat io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151)\n\tat io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142)\n\t… 20 more\n”,“segment”:{“dataSource”:“MiddletiersBillingProd”,“interval”:“2015-07-31T21:00:00.000Z/2015-07-31T22:00:00.000Z”,“version”:“2015-09-14T12:10:57.917Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://lgprc-xiaomi/user/h_miui_ad/druid/storage/chenxuehui/MiddletiersBillingProd/20150731T210000.000Z_20150731T220000.000Z/2015-09-14T12_10_57.917Z/0/index.zip”},“dimensions”:“platform,adid,incomeLevel,age,province,gender,education,mediaType,dsp,city”,“metrics”:“count,fee,expose,click,startDownload”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:9047397,“identifier”:“MiddletiersBillingProd_2015-07-31T21:00:00.000Z_2015-07-31T22:00:00.000Z_2015-09-14T12:10:57.917Z”}}}]

2015-09-15 06:06:14,818 ERROR i.d.s.c.ZkCoordinator [ZkCoordinator-0] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[MiddletiersBillingProd_2015-07-31T20:00:00.000Z_2015-07-31T21:00:00.000Z_2015-09-14T11:00:40.982Z], segment=DataSegment{size=5986177, shardSpec=NoneShardSpec, metrics=[count, fee, expose, click, startDownload], dimensions=[adid, age, province, gender, mediaType], version=‘2015-09-14T11:00:40.982Z’, loadSpec={type=hdfs, path=hdfs://lgprc-xiaomi/user/h_miui_ad/druid/storage/chenxuehui/MiddletiersBillingProd/20150731T200000.000Z_20150731T210000.000Z/2015-09-14T11_00_40.982Z/0/index.zip}, interval=2015-07-31T20:00:00.000Z/2015-07-31T21:00:00.000Z, dataSource=‘MiddletiersBillingProd’, binaryVersion=‘9’}}

2015-09-15 06:06:14,818 INFO c.m.e.c.LoggingEmitter [ZkCoordinator-0] Event [{“feed”:“alerts”,“timestamp”:“2015-09-15T06:06:14.818Z”,“service”:“druid/prod/historical”,“host”:“hh-miui-ad-preview01.bj:8751”,“severity”:“component-failure”,“description”:“Failed to load segment for dataSource”,“data”:{“class”:“io.druid.server.coordination.ZkCoordinator”,“exceptionType”:“io.druid.segment.loading.SegmentLoadingException”,“exceptionMessage”:“Exception loading segment[MiddletiersBillingProd_2015-07-31T20:00:00.000Z_2015-07-31T21:00:00.000Z_2015-09-14T11:00:40.982Z]”,“exceptionStackTrace”:“io.druid.segment.loading.SegmentLoadingException: Exception loading segment[MiddletiersBillingProd_2015-07-31T20:00:00.000Z_2015-07-31T21:00:00.000Z_2015-09-14T11:00:40.982Z]\n\tat io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146)\n\tat io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171)\n\tat io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42)\n\tat io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:121)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510)\n\tat org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)\n\tat com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)\n\tat org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:507)\n\tat org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)\n\tat org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)\n\tat java.lang.Thread.run(Thread.java:722)\nCaused by: com.metamx.common.ISE: Segment[MiddletiersBillingProd_2015-07-31T20:00:00.000Z_2015-07-31T21:00:00.000Z_2015-09-14T11:00:40.982Z:5,986,177] too large for storage[/home/work/app/druid/logs/persistent:112,864].\n\tat io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:116)\n\tat io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93)\n\tat io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151)\n\tat io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142)\n\t… 20 more\n”,“segment”:{“dataSource”:“MiddletiersBillingProd”,“interval”:“2015-07-31T20:00:00.000Z/2015-07-31T21:00:00.000Z”,“version”:“2015-09-14T11:00:40.982Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://lgprc-xiaomi/user/h_miui_ad/druid/storage/chenxuehui/MiddletiersBillingProd/20150731T200000.000Z_20150731T210000.000Z/2015-09-14T11_00_40.982Z/0/index.zip”},“dimensions”:“adid,age,province,gender,mediaType”,“metrics”:“count,fee,expose,click,startDownload”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:5986177,“identifier”:“MiddletiersBillingProd_2015-07-31T20:00:00.000Z_2015-07-31T21:00:00.000Z_2015-09-14T11:00:40.982Z”}}}]

2015-09-15 06:06:14,823 ERROR i.d.s.c.ZkCoordinator [ZkCoordinator-0] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[MiddletiersBillingProd_2015-07-31T18:00:00.000Z_2015-07-31T19:00:00.000Z_2015-09-14T11:00:40.982Z], segment=DataSegment{size=7879904, shardSpec=NoneShardSpec, metrics=[count, fee, expose, click, startDownload], dimensions=[adid, age, province, gender, mediaType], version=‘2015-09-14T11:00:40.982Z’, loadSpec={type=hdfs, path=hdfs://lgprc-xiaomi/user/h_miui_ad/druid/storage/chenxuehui/MiddletiersBillingProd/20150731T180000.000Z_20150731T190000.000Z/2015-09-14T11_00_40.982Z/0/index.zip}, interval=2015-07-31T18:00:00.000Z/2015-07-31T19:00:00.000Z, dataSource=‘MiddletiersBillingProd’, binaryVersion=‘9’}}

在 2015年9月15日星期二 UTC+8下午2:46:32,Xuehui Chen写道:

Hi Xuehui, can you share your historical configuration?

Particularly this line:

druid.segmentCache.locations

Also, how that the value of that configuration compare to
druid.server.maxSize?

If you have just one disk/directory to store segments on historicals, the values of those two configs should be the same.