Can't restart historical node

Hi,
I can’t restart my historical node.

There are always some gc exception:

OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages failed.

OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed.

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007f7eb0000, 196608, 0) failed; error=‘Cannot allocate memory’ (errno=12)

2015-10-28 07:19:39,653 ERROR o.a.c.f.l.ListenerContainer [ZkCoordinator-0] Listener (io.druid.server.coordination.BaseZkCoordinator$1@6932a575) threw an exception

java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method) ~[?:1.7.0_79]

at java.lang.Thread.start(Thread.java:714) ~[?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) ~[?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1590) ~[?:1.7.0_79]

at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:333) ~[?:1.7.0_79]

at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) ~[?:1.7.0_79]

at io.druid.server.coordination.ZkCoordinator.removeSegment(ZkCoordinator.java:266) ~[druid-server-0.8.1-rc2.jar:0.8.1-rc2]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:145) ~[druid-server-0.8.1-rc2.jar:0.8.1-rc2]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) ~[druid-server-0.8.1-rc2.jar:0.8.1-rc2]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) ~[druid-server-0.8.1-rc2.jar:0.8.1-rc2]

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) ~[druid-server-0.8.1-rc2.jar:0.8.1-rc2]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) ~[curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) ~[curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_79]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_79]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_79]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_79]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]

2015-10-28 07:19:39,691 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] zNode[/druid/loadQueue/10.57.66.35:8080/fill_rate_data_2015-10-08T00:00:00.000Z_2015-10-09T00:00:00.000Z_2015-10-08T17:24:38.468Z] was removed

在 2015年10月28日星期三 UTC+8下午3:43:35,luo…@conew.com写道:

Hi,
I believe you need to increase the ulimit for max user processes,

Try doing ulimit -a to see your current configs and try increasing it.

Hi, Nishant:
Here is the result of command “ulimit -a”:

[luotao@yw-0-0 ~]$ ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 514942

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 4194304

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 10000

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

I think the number of “max user processes” is large enough.

I cleaned the segments from the historical node, then restarted it. It will reload the segments from deep storage and work well now…

I’m also confused about the caused reason about this exception

在 2015年10月28日星期三 UTC+8下午6:38:42,Nishant Bangarwa写道:

Hi,

Did it start failing suddenly (after working for a while) or this is a fresh setup (or did you do any os updates and that triggered this)?

– Himanshu

Hi, Himanshu:

It start failing suddenly when loading the local reserved segments. I didn’t do any os updates

在 2015年10月30日星期五 UTC+8下午12:10:20,Himanshu Gupta写道:

Hi,

Since it is failing with OOM, I would first try to check if there is enough free memory available on the system and that the process is getting appropriate -Xmx , --XX:MaxDirectMemory settings.

also see if http://javaeesupportpatterns.blogspot.com/2012/09/outofmemoryerror-unable-to-create-new.html is any help.

– Himanshu