Namespacing segments, or preventing unknown segments from wreaking havoc

We had a big outage in our Druid cluster today. We run our Druid servers in Kubernetes, and our historicals use machine local SSDs for their segment caches. We made the unfortunate choice to have our production and staging historicals share the same pool of machines, and today got bit by this for the first time.

A production historical started up on a machine whose segment cache contained segments from our staging cluster. Our prod and staging clusters use the same names for data sources.

This meant that these segments overshadowed production segments which happened to have lower versions. Worse, when DruidCoordinatorCleanupOvershadowed kicked in, all of the production segments that were overshadowed got used=false set, and quickly got dropped from historicals. This ended up being the majority of our data. We eventually figured out what was going on and did a bunch of manual steps to clean up (turning off and clearing the cache of the two historicals that had staging segments on them, manually setting used=true for all entries in druid_segments, waiting a long long time for data to re-download), but figuring out what was going on was subtle (I was very lucky I had randomly decided to read a lot of the code about how the used column works and how versioned timelines are calculated just a few days ago!).

(We were also lucky that we turned off coordinator automatic killing literally today!)

I feel like Druid should have been able to protect me from this to some degree. (Yes, we are going to address the root cause by making it impossible for prod and staging to reuse each others’ disks.) Some thoughts on changes that could have helped:

  • Is the Druid standard to prepend the “cluster” name to the data source name, so that conflicts like this are never possible? We are certainly tempted to do this now but nobody ever told us to.

  • Should clusters have an optional name/namespace, and DataSegments have that namespace recorded in it, and clusters refuse to handle segments they find that are from a different namespace? This would be like the common database setup where a single server/cluster has a set of database which each have a set of tables.

  • Should historicals refuse to announce segments that don’t exist in the druid_segments table, or should coordinators/brokers/etc refuse to pay attention to segments announced by historicals that don’t exist in the druid_segments table. I’m going to guess this is difficult to do in the historical because the historical probably doesn’t actually talk to the sql DB at all?

–dave