Is it possible to use dropped segments as a source?


I’ve never dropped any segments yet but I’m forced to do so now.

We use existing hourly segments as a source to build daily segments with different schema,

what I don’t know is whether it is possible to do this even with segments that are dropped or whether

the hadoop indexer just grabs them from s3 deep storage where they still reside even though they’re dropped.

It is not quite easy to load them back, at least I don’t understand the last paragraph enough :

It is possible as long as you don’t “kill” the segments (which would permanently remove them from deep storage). If you want to load them back then you would do:

  1. Change your rules to extend the load period to cover the previously-dropped segments

  2. POST to the “enable datasource” API: /druid/coordinator/v1/datasources/{dataSourceName}

Hi Jakub,

Do not drop the indexed file from deep storage only. This will make your historical nodes crash by causing “Can’t find indexed file”.

To fully drop segments, you need first disable it, then delete the files.

There are restful APIs in druid allow you to do so.

First, you can list the segments in your data source:

curl -XGET http://<coordinator_host>:8081/druid/coordinator/v1/metadata/datasources/<data_source>/segments

Then disable the segments you want (be careful about the interval):

curl -I -XDELETE http://<coordinator_host>:8081/druid/coordinator/v1/datasources/<data_source>/segments/<data_source>_<interval_suffix>

Then disable the interval:

curl -I -XDELETE http://<coordinator_host>:8081/druid/coordinator/v1/datasources/traffic_beta/intervals/

And finally, you can delete the indexed files in the deep storage.

Hope it helps.