Historical data migration

Hi Druid Experts,

Currently, we are using Elastic search in our analytics stack and we are moving away from ES to druid. What will be the best approach to migrate historical data from elastic search to Druid? This is production setup with medium load of around in 10TB of data. I read in druid documentation where is it suggests to use hadoop batch ingestion . Since we have different deep storage this solution is not feasible.

  1. Can we use Native Index Tasks to migrate data for load of 10 TB?

  2. Our real time data is ingested via kafka pipeline. Can we run multiple tasks like kafka and index task on same datasource at same time given that intervals do not overlap.



Hi Shruti:

I saw that was possible to dump elasticsearch data into json format offline. https://github.com/taskrabbit/elasticsearch-dump . Have you checked if it’s possible? Druid can do native indexing on raw data in json format. So it’s probably doable.

And yes, you can index to same datasource if their time intervals do not overlap.