Druid updating data


I work for an ad tech company. We have data for different ads. Each ad will have different properties like color, size, audience, So I’ll store data (Spend, Clicks) for these dimensions every hour. After that I aggregate (group and filter) data on those dimensions and show which is pretty straightforward and cool. But the dimensions keep changing for historical data. For example the user may add a custom dimension called type. Then I have to look up on all ads and update every row in the past which doesn’t sound right.

How do we solve this problem?

“the user may add a custom dimension called type”

Would this new dimension be available for newer events or also available for older events?

If only for new events then you can ingest new events with extra dimension as druid has capability to ingest data with different schema in the same datasource.

If the new dimension has to be added for older events then data can be re-ingested using batch ingestion and druid will take care of atomically replacing the older data with newer ingested data.

Also,if this new dimension is a function of some other dimension already present in the event then you can also consider using lookups(druid.io/docs/latest/querying/lookups.html).