Is druid capable of handling multiple ingestion queues?

Hi all,

For now, I have 6 slots for ingestion tasks on 2 middlemanagers (3 each)

All tasks I receive are Hadoop ingestion using an EMR cluster.

My production cluster receives ingestion tasks every 2 minutes (over all datasources) and the most part of it don’t last more than 1mn 30s.

Perfect world for now.

But sometimes, it receives heavier tasks (between 3 and 15 tasks at the same time) that take up to 1h each.

My problem is that the lighter tasks can’t run because all slots are taken by heavy tasks. So we are getting some delay for our users.

So I’d like to know if, as the tier can help handle data/queries, there is any way to :

  • categorize ingestion tasks and assign them to specific slots (like yarn queue)

Or

  • set priority for tasks so we could ingestion lighter data before heavy tasks even if they arrived after (not my preferred solution,but could help)

If not, is it an idea in the Druid backlog ?

NB: I know I could just add some slots, but I don’t think it is a good long term answer :slight_smile:

Thanks

Hi Guillaume,

If those heavy tasks belong to specific data sources(let’s say you have 10 data sources with 7 data sources having light tasks and 3 data sources with heavy tasks),

I think you can try to put those 3 data sources with heavy tasks in a different tier instead of _default_tier.

Thank you.

–siva

Hi Siva,

Thanks for your reply.

I thought tier were only to segregate storage, am I wrong ? Is that also segregating taks processing ?

Unfornately, no, heavy tasks don’t belong to specific datasources (they just don’t handle the same segments)

Le mer. 23 oct. 2019 à 00:36, Siva Mannem siva.mannem@imply.io a écrit :

You can also set task priorities via https://druid.apache.org/docs/latest/ingestion/tasks.html#context-parameters

Note you can not add the context via via the console right now (this is being fixed in https://github.com/apache/incubator-druid/pull/8725)