I am right now working on dropping the data older than 1 hour from the cluster. I have set the rule as below:


According to this rule, only past 1 hour data should be taken into consideration for any druid query. The problem is I can see that, through this rule ‘used’ field in segments table is set to 0 but I can still query the old data.

Can you tell me the possible causes for this kind of behavior? I also disabled the cache and tried with time series, select queries but it is still returning old data too.

the coordinator rules applies to segments after they are handed off to the historical nodes and are run periodically,

If your segments are in realtime nodes coordinator rules wont get applied to them.

If you want to query data for prev 1 hour, you can specify the interval in your query.

Any data older than 1 hour, will eventually get dropped when the coordinator rules run.

I am still confused. My segments are in realtime node, so what is the way to drop data older than 1 hour from historical node? I dont want to specify in the query, but my requirement is to drop the data from node.

With the below rule,


It won’t get into consideration immediately? I can see “0” in used field in segments table, but its data is still populated by queries. Can you brief me on this?

Jvalant, rules only apply to historical nodes. You need handoff to occur before you can set the rules.

Please try to follow for setting up a data pipeline from Kafka. We strongly discourage realtime nodes from being used post 0.9.0

How I can modify the handoff related time? I have to configure any property in realtime or coordinator node ? Please suggest the possible modification.


