Switching from local batch ingestion to hadoop

I started using Druid by ingesting data from local file system.

overloard/runtime.properties:

druid.storage.type=local

druid.storage.storageDirectory=/data/deep

Now I would like to switch to hadoop but there are two questions:

  1. Can I simply copy the existing segments into hadoop and druid will find it

  2. Can I set up start a second indexing service on a different port with the hadoop configuration in order to test that setup. I have a production cluster and can’t afford to have it not working for more then some minutes

Thansk for your help.

Best regards

Roman

Hi Roman,

Do you have a copy of your raw data lying around? If yes, you can just reindex that data with a new deep storage. If no, you can look at the reingest segment firehose to reindex segments and move them to a new deep storage. BTW, when you say hadoop, do you mean you’d like to use HDFS as your deep store? Or do you mean you want to switch to using hadoop based batch ingestion?

Hi thanks for your answer and sorry for my late reply. I still have a copy of the raw data.

I want to use hdfs as deep storage. Currently its a local directory. My question is how do I make the transition smooth? So if I change the configuration in the indexing service to a hdfs file system, then it will not find the old segments, right? Can I have like two paths and then step by step reindex everything so that the deep storage is smoothly transfereed from the local to the hdfs file system?

If your segments are currently in local storage, there is no easy way to migrate them. The easiest thing to do is the reindex your data and regenerate segments from the raw data. Make sure to properly configure your indexing nodes to use hdfs as the deep store.