Is there a way to rename a Druid datasource?
The reason is that we basically are considering a “re-do” of an existing large datasource, and it will take considerable time to re-ingest for this large datasource, and we still want users to be able to access it (the existing old version).
We are thinking of ingesting a new datasource, then doing a “table swap”, then dropping the old datasource.
Interesting use case. I don’t think there is a direct SQL command or API to achieve that.
Disclaimer, I have not tested this!!
I believe what you would need to do is:
- reingest from
table2 and wait for it to finish
- during the name swap, you’ll want to prevent queries or ingestion because it will be in a dysfunctional state until you complete the swap, so you’ll want to first:
– shutdown Druid except for the metadata database.
- change the names of the datasource folder in deep storage,
- change the metadata directly on your Metadata DB with:
UPDATE sys.segments SET datasource='table1_old' WHERE datasource='table1'
UPDATE sys.segments SET datasource='table1' WHERE datasource= 'table2'
- start Druid
Anyone else out there have any thoughts on this procedure?
Just realized that I missed a big piece of the metadata that would also need to change.
Wherever there is a segment_id, it will also need to be updated. And this can get trickier since segment_ids have the form:
<datasource>_<time interval>_<version>[_<partition number>]
So the updates would need to be a bit more complicated and need to span at least the sys.segments and sys.server_segments tables, updating both
segment_id columns with the two step name change to do the swap.
Not sure whether sys.tasks or sys.supervisors which also has datasource would need to change.
So this sounds like it isn’t for the faint of heart, but if you try it let us know how it goes and it probably doesn’t need saying, but don’t do this in production until you know it works.