Facing OOM Exception in druid overlord

Hi,
We are running druid on cent OS 6 with openJDK 8 64bit with around 16GB ram with 2GB swap space for the druid overlord node.

However we keep facing OOM exceptions after which we need to restart the overlord node. Stack trace of the exception is as below.

Is this really OOM error or is this due to max utilization of the thread count on the java process.

Any help would be appreciated.

runtime.properties for druid overlrod(<%= %> implies Embedded ruby templates):

druid.host=<%= %>

druid.port=<%= %>

druid.service=druid/overlord

druid.indexer.runner.type=remote

druid.indexer.storage.type=local

druid.indexer.queue.startDelay=PT30S

druid.db.connector.connectURI=<%= %>

druid.db.connector.user=<%= %>

druid.db.connector.password=<%= %>

#druid.selectors.indexing.serviceName=druid/overlord

druid.indexer.runner.javaOpts="-server -Xmx256m"

druid.indexer.runner.startPort=8088

druid.indexer.fork.property.druid.processing.numThreads=1

druid.indexer.fork.property.druid.computation.buffer.size=100000000

Java stack trace for OOM Exception:

2016-10-17T10:42:24,259 ERROR [Curator-LeaderSelector-0] io.druid.indexing.overlord.TaskMaster - Failed to lead: {class=io.druid.indexing.overlord.TaskMaster, exceptionType=class java.lang.reflect.InvocationTargetException, exceptionMessage=null}

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_79]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_79]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_79]

at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_79]

at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) ~[java-util-0.27.9.jar:?]

at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) ~[java-util-0.27.9.jar:?]

at io.druid.indexing.overlord.TaskMaster$1.takeLeadership(TaskMaster.java:141) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:534) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) [curator-recipes-2.10.0.jar:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_79]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_79]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]

Caused by: java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method) ~[?:1.7.0_79]

at java.lang.Thread.start(Thread.java:714) ~[?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) ~[?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360) ~[?:1.7.0_79]

at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:628) ~[?:1.7.0_79]

at org.apache.curator.utils.CloseableExecutorService.submit(CloseableExecutorService.java:191) ~[curator-client-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.submitToExecutor(PathChildrenCache.java:812) ~[curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.offerOperation(PathChildrenCache.java:763) ~[curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:310) ~[curator-recipes-2.10.0.jar:?]

at io.druid.indexing.overlord.RemoteTaskRunner.start(RemoteTaskRunner.java:304) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

… 19 more

you may be hitting ulimit on “max user processes” (ulimit -u), can you try raising ulimit ?