For now, I have 6 slots for ingestion tasks on 2 middlemanagers (3 each)
All tasks I receive are Hadoop ingestion using an EMR cluster.
My production cluster receives ingestion tasks every 2 minutes (over all datasources) and the most part of it don’t last more than 1mn 30s.
Perfect world for now.
But sometimes, it receives heavier tasks (between 3 and 15 tasks at the same time) that take up to 1h each.
My problem is that the lighter tasks can’t run because all slots are taken by heavy tasks. So we are getting some delay for our users.
So I’d like to know if, as the tier can help handle data/queries, there is any way to :
- categorize ingestion tasks and assign them to specific slots (like yarn queue)
- set priority for tasks so we could ingestion lighter data before heavy tasks even if they arrived after (not my preferred solution,but could help)
If not, is it an idea in the Druid backlog ?
NB: I know I could just add some slots, but I don’t think it is a good long term answer