How to effectively use multiple node with druid ?


I’ve build a cluster with 2 datanode exact same configuration on both

On each datanode I’ve the historical node and the middleManager node

The segments are stored locally on each machine

I’m indexing data using hadoop cluster.

The task submitted goes always on the same node and the segments are also always saved on the same node.

how to leverage the fact that I’ve 2 nodes.

When I query the data of course only 1 node is working

I’m using AWS machine my deep storage is on S3 index tasks are handled by Hadoop EMR

Could you help me to understand


Hey Richard,

When you index using a remote hadoop cluster, one of the middleManagers will submit the job to the remote hadoop cluster, and then basically just sit there waiting for the job to finish. The real indexing work is done on the hadoop cluster. So it’s okay that only one of your Druid nodes is involved there.

When you actually query the data, that should be hitting both historical nodes. If it’s not, try checking out your coordinator to make sure they’re both registered. That should be at http://COORDINATOR_HOST:8081/#/