Curator-ServiceCache thread leak while using tranquility/flink

Hello druid users,

We are using tranquility (0.9.0.0) with flink(1.4.2) to ingest events in real time to druid. I have noticed that the processes crash (OOM) after running for couple of days. I suspect it to be a thread leak, and below are few observations. Are we aware of any thread leaks in that area? Any suggestions?

I noticed many instances of the following thread:

“Curator-ServiceCache-0” #3535 daemon prio=5 os_prio=0 tid=0x00007f7262c56000 nid=0x723d waiting on condition [0x00007f713fbc8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004a41619b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None

I do notice the thread number constantly going up

amalakar:~$ sudo jstack -l pgrep -f TaskManager | grep Curator-ServiceCache | wc -l
2055

8 hours later

amalakar:~$ sudo jstack -l pgrep -f TaskManager | grep Curator-ServiceCache | wc -l
6439

jcmd native memory summary:

amalakar: ~$ sudo jcmd pgrep -f TaskManager VM.native_memory detail.diff
70803:
Native Memory Tracking:
Total: reserved=28447967KB +5033971KB, committed=27270923KB +5038263KB

  •                Thread (reserved=8268032KB +4620359KB, committed=8268032KB +4620359KB)
                          (thread #8009 +4475)
                          (stack: reserved=8232224KB +4600300KB, committed=8232224KB +4600300KB)
                          (malloc=26327KB +14719KB #40058 +22375)
                          (arena=9481KB +5340 #16017 +8950)
    

thread tracking:

Hey Arup,

I am not aware of any issues like that with tranquility-core (I have generally seen it running stably for long periods of time) but I am not as sure about tranquility-flink module.