How to control and contain raw data flow to specific peons and historicals within a cluster?

We are trying to build a Druid architecture that can handle sites that are located in two separate geographical locations. One in east and one in west. We would like to query all the data using a single point so we figured building a cluster will help.

So we will start with putting a single server at each location and build a cluster out of that but the challenge is to avoid sending raw data across the wide area network. Each server will be receiving data on a Kafka broker installed on the local system. So how do we make sure that the data originated in west is ingested in middleManager and kept in historical nodes that are assigned to the server in West. East will follow the same logic.

Both servers will have a druid broker for HA.

Is there way to force cluster use middleManager and historical resources on west to process west data and do the same for east? Or is it possible to build two completely separate standalone system but have a broker on a third system and have it tap into these two clusters simultaneously?

Thanks,

If the deep storage is common for both druid clusters, won’t historical have data from both regions? After that you can query from broker on this historical, which will return data from both regions if I’m not wrong. Also would like to know why do you want historical servers in each region.

You can manage that if the data-source name is different, Using Rule-Configuration, you could define the rule in west historical node to load data-source_west and drop data-source_west and vice-versa for the east. Assuming deep storage is the same.

Thanks,

Harsh

Deep storage will be the same but I was hoping that I can stick to a single data store on Druid and avoid sending West’s data to East and East’s data to west during ingestion.

Is that possible? Can you trick middleManager/Peons resources in a cluster based on location or Kafka Broker IP address and control what resource is used to consume which Kafka Topic?

Thanks

Pushing data to separate data sources(east and west) should solve your problem.