Druid 0.8.3 is not dropping segments after modifying the used flag to 0 in MySQL

Hello everyone,

I am trying to schedule my rules to drop segments of particular datasource older than 1 hour. My rule is as below:

[{“period”:“PT1H”,“tieredReplicants”:{"_default_tier":1},“type”:“loadByPeriod”},{“type”:“dropForever”}]

After applying this rule, it is modifying the used flag to 0 for all the segments older than 1 hour, but somehow druid is not dropping those segments from the cluster.I can still see the segments in coordinator web console and my queries are still returning the result.

What can be the possible reasons for this ? Is there anything that I am missing in the configuration?

Note: my datasource is assign to realtime node and data is injected realtime through kafka.

Hoping for the reply.

Thanks,
Jvalant

Hey Jvalant,

Coordinator load/drop rules only apply to historical nodes. Any data in the realtime indexes will need to be handed off to historical nodes before it is dropped.

Hi Gian,

Thanks for the reply.

How I can configure druid so that realtime indexes would be handed off to historical nodes immediately when coordinator encounter from the meta data that segment is not being used anymore ?

Basically, I want to configure my druid cluster such a way that my realtime data older than 1 hour should be dropped from the cluster and should not be available for any queries.

Your suggestions would be so helpful.

Thanks,
Jvalant

Jvalant,

Are you just trying to keep 1 hour of data around and drop anything that is older than that?

If so, you should set your segmentGranularity to 1 hr, windowPeriod to 10 mins, and your rejectionPolicy to serverTime.

FWIW, Druid’s realtime ingestion story has changed significantly since 0.8.3. We no longer recommend using realtime nodes.

Hi Fangjin,
Do you mean that Druid is no longer recommended to do real-time ingestion and it is being upgraded towards batch ingestion use case?

Thanks!

Nah, nothing like that. We are encouraging using the indexing service rather than realtime nodes for streaming ingestion. See this doc for more info: http://druid.io/docs/latest/ingestion/stream-ingestion.html