Error while submitting Reindexing task on middlemanager

Hi,

Below is the error log which i received when re-indexing task was submitted on middlemanager.

2019-01-10T12:46:42,731 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[COUNTER_GROUP_ID] conversions with cardinality[74] in 4 millis.

2019-01-10T12:46:42,733 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[HSS_P_1] conversions with cardinality[3] in 0 millis.

2019-01-10T12:46:42,738 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[HSS_P_2] conversions with cardinality[5] in 5 millis.

2019-01-10T12:46:42,739 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[HSS_P_3] conversions with cardinality[59] in 1 millis.

2019-01-10T12:46:42,739 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[HSS_P_4] conversions with cardinality[53] in 0 millis.

2019-01-10T12:46:42,740 INFO [HSS_Hourly-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[HSS_P_5] conversions with cardinality[2] in 0 millis.

2019-01-10T12:46:42,741 INFO [HSS_Hourly-incremental-persist] io.druid.segment.IndexMergerV9 - Completed dim conversions in 14 millis.

2019-01-10T12:46:43,149 ERROR [HSS_Hourly-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - dataSource[HSS_Hourly] – incremental persist failed: {class=io.druid.segment.realtime.appenderator.AppenderatorImpl, segment=HSS_Hourly_2019-01-09T10:00:00.000Z_2019-01-09T11:00:00.000Z_2019-01-10T12:40:00.382Z, count=0}

2019-01-10T12:46:43,150 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down…

2019-01-10T12:46:43,152 INFO [appenderator_persist_0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Removing sink for segment[HSS_Hourly_2019-01-09T10:00:00.000Z_2019-01-09T11:00:00.000Z_2019-01-10T12:40:00.382Z].

2019-01-10T12:46:43,156 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_HSS_Hourly_2019-01-10T12:40:00.374Z, type=index, dataSource=HSS_Hourly}]

java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Too many open files

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.segment.realtime.appenderator.FiniteAppenderatorDriver.persist(FiniteAppenderatorDriver.java:240) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:453) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:207) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_152]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_152]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_152]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]

Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Too many open files

at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_152]

at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_152]

at io.druid.segment.realtime.appenderator.FiniteAppenderatorDriver.persist(FiniteAppenderatorDriver.java:232) ~[druid-server-0.10.0.jar:0.10.0]

… 8 more

Caused by: java.lang.RuntimeException: java.io.IOException: Too many open files

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1022) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:95) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:389) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) ~[druid-common-0.10.0.jar:0.10.0]

… 4 more

Caused by: java.io.IOException: Too many open files

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.8.0_152]

at java.io.File.createTempFile(File.java:2024) ~[?:1.8.0_152]

at java.io.File.createTempFile(File.java:2070) ~[?:1.8.0_152]

at io.druid.segment.data.TmpFileIOPeon.makeOutputStream(TmpFileIOPeon.java:55) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.data.GenericIndexedWriter.open(GenericIndexedWriter.java:142) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.data.BlockLayoutLongSupplierSerializer.open(BlockLayoutLongSupplierSerializer.java:82) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.LongColumnSerializer.open(LongColumnSerializer.java:76) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMergerV9.setupMetricsWriters(IndexMergerV9.java:537) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:202) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.merge(IndexMerger.java:437) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.persist(IndexMerger.java:185) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.persist(IndexMerger.java:151) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1001) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:95) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:389) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) ~[druid-common-0.10.0.jar:0.10.0]

… 4 more

Suppressed: java.io.IOException: Failed to list contents of var/druid/task/index_HSS_Hourly_2019-01-10T12:40:00.374Z/work/persist/HSS_Hourly_2019-01-09T10:00:00.000Z_2019-01-09T11:00:00.000Z_2019-01-10T12:40:00.382Z/0/v9-tmp

at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1647) ~[commons-io-2.4.jar:2.4]

at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) ~[commons-io-2.4.jar:2.4]

at io.druid.segment.IndexMergerV9$3.close(IndexMergerV9.java:148) ~[druid-processing-0.10.0.jar:0.10.0]

at com.google.common.io.Closer.close(Closer.java:214) ~[guava-16.0.1.jar:?]

at io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:245) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.merge(IndexMerger.java:437) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.persist(IndexMerger.java:185) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.IndexMerger.persist(IndexMerger.java:151) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1001) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:95) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:389) ~[druid-server-0.10.0.jar:0.10.0]

at io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) ~[druid-common-0.10.0.jar:0.10.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_152]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_152]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_152]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]

2019-01-10T12:46:43,161 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_HSS_Hourly_2019-01-10T12:40:00.374Z] status changed to [FAILED].

2019-01-10T12:46:43,163 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_HSS_Hourly_2019-01-10T12:40:00.374Z”,

“status” : “FAILED”,

“duration” : 390870

}

2019-01-10T12:46:43,166 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@71b97eeb].

2019-01-10T12:46:43,166 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@22e2266d]

2019-01-10T12:46:43,166 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/10.54.148.235:8102]

Kindly suggest solution, thanks in advance.

Amit,

There’s a limit to how many open files can be at one time. Check the following configurations on your linux servers.

/etc/sysctl.conf

fs.file-max -> this dictates the maximum number of open files. Try putting the value 20000

/etc/security/limits.conf (see sample values below)

  •   soft		no file		20000
    
  •   hard		no file		30000
    

This would require a reboot of the servers since its at the system level. If you are still having same issues you can up the values upto 100000 but make sure that your cluster can support that much of open files.

Rommel Garcia
Director, Field Engineering
rommel.garcia@imply.io
404.502.9672

Check to see if there are orphaned processes that needs to be killed.

Rommel Garcia

Director, Field Engineering

Hi Rommel Garcia,

After i updated the hard and soft limit i was working for some time, but after one day the server got corrupt and we were no longer able to login.

Kindly suggest.

Hi,

My re-indexing job contains around 1250 metrics, is the issue on middle manager related to this,

please suggest.

Please check out put of the command “less /procs//limits” to see the real ulimit applied on the process. If it’s lower than expected, check the ulimit of the parent PID with the same command, to see if that is the cap.

Ming

Thanks Ming for reply,

I checked the limits file for PID of middle manager which was 4096( open files), that is by default allocated to all the running processes.

How to know the parent PID of middle manager?

How to know the maximum number of open files required for a process?

Last time i increased the hard limit in **/etc/security/limits.conf ** ,it worked for a day after that the server got crashed and we needed to rebuilt the server.

Could you please suggest the proper steps to increase the limits if that is required in my case

Thanks in advance.

Hi Amit:

Modifying configs in /etc/security/limits.conf is the correct way to increase ulimit . 4k is bit too low , and i’d recommend at least 32k.

hope this helps

Ming

Thanks Ming,
One more question, do we need to reboot the server after changes or just logging out would work.

Amit,

Post this, just restart the daemon:

systemctl daemon-reload

Then login and check the limits using ulimit or /proc.

Thanks & Rgds

Venkat