Update segment metadata via api

We would like to separate out batch-processing outside the context of druid. That helps us pre-process data as well keep it scaled independent of druid.
Looked at https://github.com/metamx/druid-spark-batch which is very powerful and that can help use generate smoosh files. So we are able to pre-process and generate smoosh files by using that library. But we are unable to figure out how to let druid/coordinator know that these are the segment files that are available with machine, go ahead upload and update segment metadata.

Hi Gaurav,

I am not too familiar with that library, but in general if you are creating segments ‘out of band’ then all you should have to do is insert the segment metadata to the druid_segments table in your metadata store.

Thank you Giann. We would like update as well , so yes we can delete rows in the metadata table and insert those and upload the data in deep storage as well.
But would this be readsafe ? Also how would historical node be aware that metadata has changed and it needs to re-download from Deep-storage. I am assuming coordinator today notifies historical nodes to update segments.

The coordinator periodically polls the metadata store and then informs the historicals of new segments they should load.

If you are creating segments out of band and updating the metadata store manually, you need to be careful not to re-use segment ids. Druid assumes they are immutable and you will generally not get correct behavior if you violate this assumption. So rather than update a segment record, you should create a new one with a new id. Btw, this is definitely advanced stuff, in the sense that Druid will handle it for you if you create segments using the normal Druid APIs.

Thanks Gian. We will give it a try and update in couple of days.

Thanks Gian that did work!!