[druid-user] Migration of data from one deep storage(HDFS) to another(HDFS) including metadata

Hi,

We have to migrate data stored in deep storage of our druid which is HDFS to another deep storage which is again an HDFS cluster. The metadata storage we have is of type mysql. We also want to migrate from existing metadata DB to new DB. We are thinking of following steps for migration.

  1. Copying data from an existing HDFS cluster to another HDFS using DistCp.
  2. Taking an sql dump of config, dataSource, supervisors, and segments metadata tables in a file
  3. Changing the location of segments in segments table in sql dump file which produced in above step to new deep storage location
  4. Import sql dump file into new metadata db.

New druid cluster will be configured with new deep storage and new metadata DB addresses. Also the druid version we are using currently is 0.19.0 and we are thinking of setting up a druid with the latest version (24.0.0) in the new cluster and then follow above steps for migration.

We are seeking help here in terms of below queries

  1. Is there any better way to migrate data from one HDFS to another HDFS specifically from a Druid point of view?

  2. Will there be a segment compatibility issue in two different druid versions (0.19.0 and 24.0.0) ? This could happen if the way segments are stored is changed in mentioned versions.

  3. What are more possible challenges that we may encounter and we may need to take care of them early?

  4. Any recommendations for anything here that can be done better?

Thanks,
Saurabh Pande.

Hi Saurabh Pande,

Regarding migration:

  1. Is there any better way to migrate data from one HDFS to another HDFS specifically from a Druid point of view?

This is a good place to start:

This older article is from Imply, my employer, but it’s also worth a read:

Regarding your other questions, the best advice I can offer is to read and follow the release notes for each of the versions between 0.19.0 and 24.0.0.

Best,

Mark

Hi Mark,

Thank you for your response.

Yes, I’ve already gone through the deep-storage-migration link that you have shared and also read about export-metadata tool. The information provided there is specifically with respect to migration from local file system to HDFS / S3 but not majorly on HDFS to HDFS which is what we are looking for. Also the limitation of using that tool is it supports exporting metadata from Derby only and our metadata storage type is mysql, that’s why we chose a way via mysqldump.

Regarding another link you have shared which is Migrate existing Druid Cluster to new Imply cluster, we have majorly referred from this article only. Our use-case aligns with Scenario 4 mentioned in the article.

I’ll go through release notes as you have said.

Thanks,
Saurabh

Hi Saurabh,

Migrating Druid deep storage and metadata are best thought of as two separate tasks.

Migrating metadata just requires moving the data (mysqldump and import is fine) and updating all the configurations that point to the metadata DB instance in Druid. As long as you maintain the structure of the exported data, you should just be able to update your top-level paths programatically.

Migrating deep storage requires updating the records in the druid_segments table to have the new, correct path.

Thanks,

Max

Thanks Max, that’ll help a lot.