Historical node dies while loading 65000 segments


We have around 65000 segments and counting. This is the second time we observed that historical node dies when segment count is around 65000. It also fails to start up again.

Here is how I start it up

export JAVA_OPTS="-XX:MaxDirectMemorySize=15g"

java -server -Xmx15g -Xms15g -XX:MaxDirectMemorySize=15g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -Xloggc:/persistent/logs/gc/h-gc.log -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.i.tmpdir=/tmp/cep -classpath config/_common:lib/*:config/historical io.druid.cli.Main server historical

Any ideas to recover form this situation?




Since the number is suspiciously close to 2^16 , My guess is that your druid process runs out of its file handle quota and fails. Please check what is the max number of open files you can have in your system from a given process. (Historical will have one file handle per segment loaded and then some for temporary stuff time to time)

– Himanshu

Hey Dave,

You can set your file descriptor limit higher by running “ulimit -n [a number]” before running Druid.

But, also, 65000 segments per node sounds like a lot. What’s their average size? If they’re quite small (<100MB) then you might benefit from using larger segments. You could do that by reindexing your data with a higher segmentGranularity or by using fewer partitions in realtime.

If the segments are too small and not sharded, you can also configure coordinator to merge smaller segments.
http://druid.io/docs/latest/configuration/coordinator.html see druid.coordinator.merge.on