We are trying to build a Druid architecture that can handle sites that are located in two separate geographical locations. One in east and one in west. We would like to query all the data using a single point so we figured building a cluster will help.
So we will start with putting a single server at each location and build a cluster out of that but the challenge is to avoid sending raw data across the wide area network. Each server will be receiving data on a Kafka broker installed on the local system. So how do we make sure that the data originated in west is ingested in middleManager and kept in historical nodes that are assigned to the server in West. East will follow the same logic.
Both servers will have a druid broker for HA.
Is there way to force cluster use middleManager and historical resources on west to process west data and do the same for east? Or is it possible to build two completely separate standalone system but have a broker on a third system and have it tap into these two clusters simultaneously?