Efficient Updates

Hey there!
I am quite new to druid and I’m trying do the following:
My database basically looks like this:

user_id, old_timestamp, spendings_today

And I get events of the form:

p_user_id, new_timestamp, cost

I now would like to update my data according to this pseudocode:

If same_day ( old_timestamp, new_timestamp)
then
spendings_today += cost;
else
spendings_today = cost;
old_timestamp = new_timestamp;
where user_id = p_user_id

Is there a clever way to do this? I appreciate any help, as have very little experience with druid.
Thank you
Lukas

Hi Lukas,

As you may know, Druid creates immutable segments that cannot be changed once they are created. What we have done for all updates in general is to regenerate segments from raw data. We keep the raw data around for a period of time, and if events come in hours or days late, or need to be updated, we run a batch processing job that recreates segments for a given interval. Each Druid segment has a version id associated with it, and reprocessed segments will have a later version id than the original set. Druid atomically replaces segments with a new version id and drops obsoleted segments with an older version id.

Does that make sense?

FJ

So the only way to do this is:

  1. Send a Query to get the old record

  2. Update the record myself

  3. Reindex the updated record, overwritting the old one

Is that correct?

Hi Lukas, you can store the raw data, update the raw data, and then reindex the raw data. Druid is a system where updates and deletes are an expensive process, however, appending new data should be a less of a hassle. It sounds like for your use case though, you will need to update records instead of just appending new records, is that true?

Yes, indeed. In this case the procedure I proposed is the only way, right?

Lukas, you don’t need to use Druid to pull out the individual rows. You can run an ETL job to update your raw data and have that logic be outside of Druid.