Historical nodes are not able to

Hi,

Historical nodes are not able to load the segments. What could be the problem ?

We are running the historical node with the follwing

java -server -Xms8g -Xmx8g -XX:MaxDirectMemorySize=8g -XX:+ExitOnOutOfMemoryError -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/data/druid/historical/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Djavax.net.ssl.trustStore=/etc/pki/java/cacerts -Djavax.net.ssl.trustStorePassword= -cp /config//_common:/config//historical:lib/* org.apache.druid.cli.Main server historical

Dhiman

2019-05-01T19:39:03,168 ERROR [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[flowlogs_unstable_2019-04-25T17:00:00.000Z_2019-04-25T18:00:00.000Z_2019-04-29T18:09:23.807Z], segment=DataSegment{size=19531, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[packets, bytes, flows], dimensions=[as_src, as_dst, peer_as_dst, iface_in, iface_out, ip_src, ip_dst, port_src, port_dst, net_src, net_dst, mask_src, mask_dst, net_mask_dst, net_mask_src, ip_proto, tcp_flags, tos, peer_ip_src, country_ip_src, country_ip_dst, sampling_rate], version=‘2019-04-29T18:09:23.807Z’, loadSpec={type=>s3_zip, bucket=>flow_monitoring_druid_segments, key=>druid/segments/flowlogs_unstable/2019-04-25T17:00:00.000Z_2019-04-25T18:00:00.000Z/2019-04-29T18:09:23.807Z/0/5b091276-48bc-4b65-b3b9-723f474139bb/index.zip, S3Schema=>s3n}, interval=2019-04-25T17:00:00.000Z/2019-04-25T18:00:00.000Z, dataSource=‘flowlogs_unstable’, binaryVersion=‘9’}}

org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[flowlogs_unstable_2019-04-25T17:00:00.000Z_2019-04-25T18:00:00.000Z_2019-04-29T18:09:23.807Z]

    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:268) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:312) [druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:47) [druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:118) [druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:538) [curator-recipes-4.1.0.jar:4.1.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:532) [curator-recipes-4.1.0.jar:4.1.0]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.1.0.jar:4.1.0]

    at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) [curator-client-4.1.0.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-4.1.0.jar:4.1.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:530) [curator-recipes-4.1.0.jar:4.1.0]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-4.1.0.jar:4.1.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) [curator-recipes-4.1.0.jar:4.1.0]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_162]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_162]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_162]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_162]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]

    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]

Caused by: org.apache.druid.segment.loading.SegmentLoadingException: Failed to load segment flowlogs_unstable_2019-04-25T17:00:00.000Z_2019-04-25T18:00:00.000Z_2019-04-29T18:09:23.807Z in all locations.

    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:205) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:166) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:133) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:196) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:157) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:264) ~[druid-server-0.14.0-incubating.jar:0.14.0-incubating]

    ... 18 more

It is possible that you don’t have enough space in your segment-cache location.

Rommel Garcia
Director, Field Engineering
rommel.garcia@imply.io
404.502.9672

segment-cache is in /data:

bash-4.2# df -h /data/druid/segment-cache/

Filesystem Size Used Avail Use% Mounted on

/dev/vdb3 446G 1.2G 445G 1% /data

can you share full historical logs for more details ?
The complete exception trace would give some clue on why the segment loading is failing.

Failed to load segment flowlogs_unstable_2019-04-25T17:00:00.000Z_2019-04-25T18:00:00.000Z_2019-04-29T18:09:23.807Z in all locations.

Can you check if Druid has write access to the segment-cache location ?