Update data in druid

How can I update the data that is already ingested in druid?
E.g. Like if i want to update user details for a user with his/her user id. Can we do this in druid?

The option that makes more sense for this kind of use cases is to use dimension lookups http://druid.io/docs/latest/querying/lookups.html

The other option is to re-index the data using the modified raw data.

Thanks Slim.
By the way do i need to re-index the whole data again or just the modified new data?

if you are going with option 2 (re-index the entire data) then yes.

assume your old druid datasource was ingestion from a file that has lines like

user-id, country, sales,….

id1, null, 100, ….

id2, null, 150, ….

id1, null, 50, ….

Now assume you have the the mapping ids to countries then you need to re-index the entire intervals from a file that looks like this

user-id, country, sales,….

id1, France, 100, ….

id2, USA, 150, ….

id1, France, 50, ….

As you can see this option can be very expensive since you have to re-build the raw data let say using an ETL processing pipeline then re-index it with druid

that’s why i have highlighted option 1 (using dimension lookup) which is more convenient for this cases i think

Thanks :slight_smile: . One last query, will spark be a better choice for these kind of use cases?

You mean use spark as batch indexing mechanism ? if it is the question i don’t think there is an obvious answer. I guess the answer to the question spark VS hadoop it all depends on the amount of data you have to index, you would have to benchmark it your self.