We have to migrate data stored in deep storage of our druid which is HDFS to another deep storage which is again an HDFS cluster. The metadata storage we have is of type mysql. We also want to migrate from existing metadata DB to new DB. We are thinking of following steps for migration.
Copying data from an existing HDFS cluster to another HDFS using DistCp.
Taking an sql dump of config, dataSource, supervisors, and segments metadata tables in a file
Changing the location of segments in segments table in sql dump file which produced in above step to new deep storage location
Import sql dump file into new metadata db.
New druid cluster will be configured with new deep storage and new metadata DB addresses. Also the druid version we are using currently is 0.19.0 and we are thinking of setting up a druid with the latest version (24.0.0) in the new cluster and then follow above steps for migration.
We are seeking help here in terms of below queries
Is there any better way to migrate data from one HDFS to another HDFS specifically from a Druid point of view?
Will there be a segment compatibility issue in two different druid versions (0.19.0 and 24.0.0) ? This could happen if the way segments are stored is changed in mentioned versions.
What are more possible challenges that we may encounter and we may need to take care of them early?
Any recommendations for anything here that can be done better?
Yes, I’ve already gone through the deep-storage-migration link that you have shared and also read about export-metadata tool. The information provided there is specifically with respect to migration from local file system to HDFS / S3 but not majorly on HDFS to HDFS which is what we are looking for. Also the limitation of using that tool is it supports exporting metadata from Derby only and our metadata storage type is mysql, that’s why we chose a way via mysqldump.
Regarding another link you have shared which is Migrate existing Druid Cluster to new Imply cluster, we have majorly referred from this article only. Our use-case aligns with Scenario 4 mentioned in the article.
Migrating Druid deep storage and metadata are best thought of as two separate tasks.
Migrating metadata just requires moving the data (mysqldump and import is fine) and updating all the configurations that point to the metadata DB instance in Druid. As long as you maintain the structure of the exported data, you should just be able to update your top-level paths programatically.
Migrating deep storage requires updating the records in the druid_segments table to have the new, correct path.