Replicating the coordinator/metadata store

Hi, what is the best practice for replicating the coordinator and its backing metadata store? We’re using MySQL, and presumably we can spin up a second coordinator, but the secondary coordinator would need to connect to the main MySQL database rather than its own, secondary MySQL instance, right? Is there an easy way to handle the failure of the main MySQL instance (e.g., hardware fault and disk is destroyed)?
–T

You can spin up a second coordinator. Coordinators elect a leader and only one of them will actually be doing coordination at any time. If the primary fails, the backup becomes the new leader. The backup coordinator should have the exact same configs as the primary coordinator and connect to the same MySQL. MySQL can be made HA, there’s some good guidelines here: https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html. Let me know if that helps.

Cool, thanks, that’s very helpful. As I investigate various means of making MySQL HA, can you give me a sense of what happens when MySQL goes offline, for whatever reason? Will realtime ingestion tasks be able to push their segments to historicals and exit? Thanks!
–T

Hey TJ,

When MySQL goes offline, no new segments can be published. The realtime ingestion tasks will keep retrying until it’s back online.

is there a preferred mysql HA method that is tested with druid? We are hoping to pick a method that has the been used often enough that we don’t run into new corner cases.

Thanks,

Arlo

Hey Arlo,

There are several sites out there using multi-AZ MySQL RDS on Amazon.

thanks, Gian.

We are running on baremetal so sadly the amazon route is not available to us. Are we pretty much out on our own?

Thanks,

  • Arlo

I know there are a few other folks running on bare metal, maybe one of them can chime in here.