Hi All,
A few months back, we have set up single node druid cluster with derby as metadata storage. Now in-flow is increased drastically around 40GB/day and my node is almost full. we have set up a parallel multinode cluster with MySQL metastore. We wanted to move complete 2 TB data from Single node cluster with selected columns if possible else complete dataset to new multinode cluster.
Your help will be appreciated. Thanks in advance.
Regards,
Jignesh
Hi Jignesh,
If your new cluster is brand new and if you can afford to stop the ingestion jobs in your current cluster, then
1 - you need to copy the data from your current deep storage to new deep storage.
2 - you need to dump the druid_segments table from your current cluster and import that file into your new cluster mysql metadata.
One caveat is let’s say your deep storage path in current cluster is /current_cluster_name/data/…
and your deep storage path in new cluster is /new_cluster_name/data/…
Now when you dump your druid_segments table from your current cluster, it will contain segment paths like /current_cluster_name/data/…
So you cannot import that file as it is.
You need to replace current_cluster_name with new_cluster_name(may be with linux command like sed )
and then only import this modified file with current segment paths into new cluster mysql.
Then you can start services on your new cluster and your segments will start moving from deep storage to historicals.
Thank you.
–siva
Thanks Shiva.
It looks pretty much straight forward. Is there any chances of data loss with this manual movement?
It would be really nice if you can also recommend some blog/article.
Jignesh