Hi all, the more we play with druid, the more concerning the stability/more questions I have. (druid 0.15.0)
Many times using the UI to check task status, the UI (tasks pane) appears unresponsive or takes a really long time to come back…
(note: calling the api to get task status also can be slow…)
what is happening here, what does this mean?
As test, I killed all the data nodes.
The UI immediately came back…showing 1 task running. (odd since I just killed all the data nodes).
I hit reload in the UI, and it spins again…for awhile.eventually coming back with the same result. (note: when we sent in a get status api call, it was also ‘hung’ until the exact same time the UI responded).
Now if I query the meta-db, I so see 2 ‘active’ tasks…
If I goto the ‘load-data’ pane while in the state, it takes me to the ‘connect’ step’
it shows in the ‘connect’ pane:
Error: Failed to sample data: java.net.SocketException: Too many open files
(admitted we have the default 1042)…I guess we can increase this, do we have recommendations on open file limits?
This is essentially a test cluster at the moment, running very few ingest tasks, usually just 1 at time! and essentially no queries of interest running yet.
Our setup: Anyways a few questions about the ‘UI’/system.
3 Master Nodes
3 Query Nodes
5 Data Nodes.
any recommendations on key configs we should look at that could be causing this behavior? is there a known open files leak?