Merge segments for kafka index task using druid segments as datasource


We have currently the following architecture: hadoop cluster (bare metal) and private cloud (based on openstack)

Druid components are installed in the private cloud and hadoop cluster is used as deep storage and as well for hadoop-batch indexing tasks.

We are using the druid new kafka-index supervisor and therefore at night submitting segments merge task using druid segments (generated as a result from the kafka index task)

as the datasource for the re-indexing task (As recommended in the Druid docs -

We are thinking to change the deep storage from Hadoop to our cloud storage (openstack SWIFT) which is having Amazon s3 compatible API, and hence can be used with Druid out of the box.

As far as I understood, even if openstack swift would be used as the deep storage instead hadoop, there should not be any problem to continue ingest data to Druid from hadoop (using map-reduce job),

my question is whether the update-exiting-data using druid segments as the datasource for the index task in such configuration would still work ? Would druid execute a map reduce job on the hadoop cluster using

the cloud storage as the input for the map reduce job (on a remote hadoop cluster) ?