Purging data in Druid for kafka stream ingestion type


For a datasource having multiple clients, there is an option in batch/hadoop ingestion to delete the data based on a particular column(customerId) by specifying the ‘filter’ in supervisor spec.
That is druid equivalent for "DELTE FROM CUSTOMER WHERE CUSTOMER_ID=‘XXX’ "

spec in case of hadoop:
“inputSpec”: {
“type”: “dataSource”,
“ingestionSpec”: {
“dataSource”: “customer”,
“intervals”: [“2013-01-01/2019-01-01”]
“filter”: { “type”: “not”, “field”:{ “type”: “selector”, “dimension”: “customerId”, “value”: “123”}}
Reference: Purging data selectively from druid - #8 by Durgadas_Chaudhari

Is there way to execute the same task in case of kafka indexing service?
Tried with the same filter on kafka, but it doesn’t work

Can u post the complete spec?

Hey Nimish!

If I read this technique correctly, this works by re-indexing existing data from Druid back into Druid again, using a filter so that it is missing out / including only certain rows when it creates the new segments.

Connecting to Kafka would add new data, rather than reindexing what you already have. So just checking, is that what you want to achieve?

(I’m intruiged by that post… now I want to try it :slight_smile: )