Priortizing data source ingestion tasks


Could some one please point me to how priortize data source ingestion tasks or allocate certain slots on middle manager for a particular data source ingestion? I have a file based ingestion data source and kafka streaming ingestion going on in parallel. When are there are large number of file based ingestion tasks running, kafka streaming ingestion tasks are kep waiting before slots become available.

Is it possible to always keep the kafka streaming tasks running on certain middle manager slots, and file based ingestion tasks can run on reamining slots (someting like priority for data sources or something else)?


Vinay Patil

hi Vinay:

Is task prioritizing the solution you are looking for ?


Hi Ming,

This the task level priority. I am looking for something which allows to set priority for ingestion tasks across data source running on middle managers. For example - Ingestion tasks for data source A have high priority than ingestion tasks for data source B OR something like reserving few slots/peons on middle managers for ingestion tasks from data source A (i.e. for example at given point of time only 10 ingestion tasks for a data source can run and remaining tasks for that data source should be in pending state)


Vinay Patil


According to Ming’s doc suggestion, Kafka ingest tasks by default run at the highest level of priority. So, need some additional information. How many middle managers do you have, and, what value is set for ‘druid.worker.capacity’ ? My thought is, you have reached your druid.worker.capacity on each of your middle managers causing some ingest tasks to remain in pending state until additional workers become available.