Could we run some kind of druid process on YARN?

hi, guys
Do anyone have experience to run some of druid nodes on YARN? I think running on YARN can expand druid’s scalability.

I think ‘overlord’ or ‘peon’ are suitable to run on YARN, because they do not use a lot of disk, and seems have a reasonable footprint.

Historical nodes seems not suitable to be run on YARN, because from ‘top’ command, I can see it use too much “virtual memory space” than “physical memory space”. YARN have a default or setted ratio of virtual memory a task can have multiplied to physical memory a task can have.

My questions are:

  1. do you suggest running Druid Nodes on YARN?

  2. do you have suggestion if I want run Historical on YARN?

I don’t have much experience with this, but the biggest problem we’ve seen with Yarn/Mesos/Kubernetes is saving state, particularly in Zookeeper. Even the historicals need to store some local state in the form of a file which contains the segments the historical is serving.

Do you mean too many data will be saved to zookeeper if too many machine had been a historical node?
Our hadoop cluster have only tens of nodes. Will this be a problem?

在 2016年7月26日星期二 UTC+8上午9:19:42,Fangjin Yang写道:

I mean in the sense that ZK stores a lot of state about quorums it has seen and the general state of things. If a container or a task shuffles or dies, some of this information can be lost and lead to unexpected behavior in the cluster.