Moving from one Druid/Hadoop cluster to another

Hi, what is the easiest way to replicate data from one Druid-Hadoop cluster into another Druid-Hadoop cluster? Assuming my cold storage layer is HDFS from the first cluster, could I simply distcp all of the segments to HDFS in my second cluster? I’d prefer not to have to unpack all of the segments and re-ingest them, is there a simple way to essentially update the coordinator’s segment metadata? Thanks!

Distcp and then using configs that point to the new hadoop should just work.


If your segment descriptors in your metadata db have paths that include the namenode (hdfs://nn/path instead of /path), then you also need to do a mass update in your metadata db. If they’re /path then it should be enough to just update the configs like Eric suggested.

This sounds like if you backup the segments but not the meta data db, then you will have to re-ingest. Is that correct?

If you don’t have the metadata store and only segments, you’ll either have to reingest or write something to read segment metadata from deep storage and populate a new metadata store.

What is the best way to backup/transfer the metadata store? A simple mysql dump?

Various metadata stores have various ways of backing up data. For mysql, the methods discussed here should work: