Multiple Druid , online and offline flows

Hi,

We have one druid, the latest Imply version running with kafka indexing servers and saving the

data in S3.

Our Druid have:

  • 5 historicals (each one have a machine ).

  • 1 master node ( coordinator and overlord) .

  • 1 broker node.

  • 2 middlemanagers.

We have a requirement to support offline reports along with our existing Druid

which serve our web app in production.

We thought to launch another Druid ( with the required components) separately, which has a different namespace in the zookeeper and will

query the same S3 and real-time data.

My main question is how to do that? We know they should point to the same s3.

Questions:

  1. Should they point to the same RDS?

  2. How do the new Druid will query the same real-time data? where is the connection between the new Druid (offline one)

to the existing middle manager (of the online Druid?

Thank you!!!

I’d suggest having a look at tiering. This allows you to segregate a single cluster into different “tiers” which could potentially configured differently to support different workloads. I’m not sure if there’s a single reference point for tiering but the router docs give a decent example of a use case - http://druid.io/docs/latest/development/router.html

Cheers,

Dylan

Hi Dylan,
Thank you for your time, I appreciate it.
In our case we want to separate for 2 Druids because of very heavy process which runs offline and we don’t it
to interfere our main druid, the main idea is it will read the same segments and return the same results without having a load on our main Druid.

Cheers,
Alon