Could some one please point me to how priortize data source ingestion tasks or allocate certain slots on middle manager for a particular data source ingestion? I have a file based ingestion data source and kafka streaming ingestion going on in parallel. When are there are large number of file based ingestion tasks running, kafka streaming ingestion tasks are kep waiting before slots become available.
Is it possible to always keep the kafka streaming tasks running on certain middle manager slots, and file based ingestion tasks can run on reamining slots (someting like priority for data sources or something else)?
Is task prioritizing the solution you are looking for ? https://druid.apache.org/docs/latest/ingestion/locking-and-priority.html
This the task level priority. I am looking for something which allows to set priority for ingestion tasks across data source running on middle managers. For example - Ingestion tasks for data source A have high priority than ingestion tasks for data source B OR something like reserving few slots/peons on middle managers for ingestion tasks from data source A (i.e. for example at given point of time only 10 ingestion tasks for a data source can run and remaining tasks for that data source should be in pending state)
According to Ming’s doc suggestion, Kafka ingest tasks by default run at the highest level of priority. So, need some additional information. How many middle managers do you have, and, what value is set for ‘druid.worker.capacity’ ? My thought is, you have reached your druid.worker.capacity on each of your middle managers causing some ingest tasks to remain in pending state until additional workers become available.