Today, Druid can have an arbitrary number of masters/querynodes/datanodes, etc.
This is great, as it scales up/down and makes the system extremely HA
My question/thought is…today the meta-data stores uses Postgres/MySQL, both of these DBs feel like the limiting factor when it comes to HA, since these are typically 1 node, or 2 nodes (using master/slave, or master/master) setups. Has there been consideration or any efforts of supporting something like cassandra, or other HA fault tolerant system for the MetaData Store.
Not that I’m aware of, although one day I think it’d be cool to move the metadata store functionality to the master (probably the Coordinator process) and have it piggy-back off their natural HA. It’d be a large-ish project but it makes sense in theory.
By the way, the metadata store is not part of the query path, so if it’s down for a brief-ish period of time (like a few minutes) you might not even notice it. That can help when thinking about fault tolerance. And, if you’re running in a cloud environment, most cloud providers have a managed mysql or postgresql offering (or both) that can make life easier.
To add onto this, the consequence of temporary outage of the metadata store is minimal. New tasks can’t be launched and new segments can’t be created. In general though, the cluster chugs along with the current state pretty well. As such, durability is, in general, more important than uptime for the metadata store. I’ve seen successful migrations in pretty complex scenarios (from Amazon Classic to Amazon VPC for example) without any issues.
Internally we are encountering some scenarios where parallel read access to the metadata store is becoming more important because overloading the coordinator is too dangerous. (ex: things like merging segments, re-structuring segments, or async query processing).
Basically your coordinator and maybe even ZK will probably fail in nasty ways before your metadata store.