Scheduling Hadoop indexing tasks in specific YARN queue

Hi there,
I’m trying to set up indexing service to schedule indexing tasks with some specific YARN queue.

I’m running Druid 0.8.0, with overlord, forking peons + Hadoop 2.4.0 with YARN. Job scheduling runs fine on default queue, but not on the one I try to specify.

I tried to set up overlord property to achieve that:

druid.indexer.fork.property.mapred.job.queue.name=druid-indexing

druid.indexer.fork.property.mapreduce.job.queuename=druid-indexing

I’m not sure which one is the right one.

And then restarted overlord, and run indexing task.

I see that overlord indeed passes the property to peon:

2015-09-24T14:52:39,036 INFO [pool-7-thread-1] io.druid.indexing.overlord.ForkingTaskRunner - Running command: java […] -Dmapreduce.job.queuename=druid-indexing -Dmapred.job.queue.name=druid-indexing

But still the peon schedules job in default YARN queue. Logs from resource manager:

2015-09-24 17:03:34,303 INFO capacity.LeafQueue (LeafQueue.java:assignContainer(1354)) - assignedContainer application= […] queue=default: […]

Am I doing something wrong? Is my way the right way to pass hadoop configuration to druid indexing tasks?

Thanks for any help!

Krzysiek

Hey Krzysiek,

I think this should work if you do “druid.indexer.fork.property.hadoop.mapreduce.job.queuename” (note the extra “hadoop.”).

Potentially easier (since you don’t have to bounce the server) would be to add it to the task JSON. In the “tuningConfig” you can add,

“jobProperties”: {“mapreduce.job.queuename”: “druid-indexing”}

Great, that worked! Ah, I missed the extra “hadoop.” in variable name…

Thanks Gian.