I have couple of questions on peon configuration(s). I using Imply distribution with 2 data nodes with the following configuration -
- 8 vCPUs
- 61 GB RAM
- 160 GB SSD storage
Because I have historical node running in the same server as the middle managers, I set my druid.worker.capacity = 6 [Number of available processors - 1] on each server. All my datasources have a segmentGranularity of 15 minutes and windowPeriod is 10 minutes. So essentially, each worker holding a segment will wait for 25 mins before it will finalize/persist the segment.
So at any give time, I can not have more than 12 segments in the Running mode.
Here is my middle manager configuration -
Number of tasks per middleManager
Task launch parameters
druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
HTTP server threads
Processing threads and buffers
Store task logs in deep storage
Here are my questions -
- Is there a way to be able to say, use less memory for certain workers, so I can increase the number of workers depending on the data I am going to ingest into each segment. I have topics where certain datasources only generate like 6k rows per day. And I have some datasources that generate data 100k rows per second that are segmented into 15 mins interval.
- What happens if I increase the worker capacity not considering the number of processors?
- How much memory does each peon task require? If the segment contains lot of data, does having less memory impact the way data in the segments are aggregated?
- Does peons due just the jvm memory or use disk space to aggregate and finalize segments?
- How can I have a strategy to isolate worker threads based on their load?
Any advice? Let me know. Thanks!