Strategy for running tasks

Hi all again!

In our setup, we are using Kafka Indexing Service for real time ingestion and we are planning for Hadoop/re-indexing tasks to minimize the number of segments. For now, we have only 2 middle manager instances (workers), each one with a capacity of 10. Is there any way to specify where the running task should by spawned? In the actual setup, we can see all tasks are assigned to a worker and only when the capacity is exhausted the other tasks get assigned on the second worker.

Moreover, is it possible to group tasks based on some criteria to run on a particular machine (let’s say we want to run the ingestion task for a client on 1st instance, all other ingestion tasks in 2nd instance and keep the 3rd worker for batch jobs)?

Thanks for your help


Hi Dan,
Tasks are assigned to middlemanagers based on a selection strategy.

For your case look for the section “Fill Capcity with Affinity” here -

You can explicitly specify affinity for tasks related to specific datasource to be assigned to one worker and all other tasks will go to other workers.

Thanks Nishant for your response!

We still need to find a solution for separating real-time ingestion from batch ingestion for the same datasource.