I’m fairly new to Druid. I’ve been setting up a pipeline where Druid ingests data from Kafka using Tranquility.
I’ve been using the imply-1.3.0 package to do this - since it was easier to get it up and running.
In my setup, I’ve set “task.partitions” to “8” in the tranquility config file.
I have 2 MiddleManager instances (on separate VMs). For each MiddleManager, “druid.worker.capacity” is also set to "8".
Now, when I push data to Kafka, I can see Druid ingesting the data via the tasks spawned by Tranquility.
However, I noticed that in each segment interval, all 8 tasks of that interval are run on the same MiddleManager instance and that instance is CPU-bound.
Is there a way to tell Druid to run 4 of the tasks on one MiddleManager node and the other 4 on the other. That way, the resources of the system would be better utilized.