Thank you so much for your help.
I tried this one, but the problem is of course, if I a value on dimension or metrics has changed from a row, and re-ingest it with the “combined” firehose, Druid doesn’t know that row already exists (as dimension or metrics has changed - there is no Primary key as such).
In the exampled attached, I added 3 rows with the first ingestion task.
The second ingestion task will only add one row which has changed (this is the use-case we have, if some of your data would change for example). I changed the metrics of one already existing metric in that row (factid=11062) and ingest with appendToExisting=false.
What I want to achieve is that it would kick out existing factid=11062 (as dimensions are all same), and reingest with the new changed metric.
But what happen is, that Druid is not doing anything with this row. All the 3 rows are staying as before. I’m not sure how or what Druid is doing internally.
When I have a look at the segments, it also looks like that the segments are not touched.
I believe or understand, that this case with changing rows is not made for Druid. Is that correct?
The correct approach is always to drop the full segment, and load all data again, is that correct? In that way, I would only have one version of the new changed rows.
But of course the disadvantage that I tried to avoid is, that I need to reload all files again. I cannot e.g. create a Change Data Capture and only ingest changed rows.
Also I need to store ingested files still, so that in case of changes, I can re-ingest all of them again.
General handling of data changes
Actually a follow-up question in the same direction is, how to handle changes in already ingested data? Probably as above explained?
But let’s say you need to change a dimension from a Value to another over the whole ingested data, that would mean you need to reload all data again. Correct? Or how is Druid handling data-changes?
I hope I explained that it is understandable, otherwise let me know.
Thanks so much for your advice and help.
FactCallSession_reindex_1.json (2.34 KB)
FactCallSession_reindex_2.json (1.75 KB)