Realtime node keeping old segments

Realtime nodes in our system are keeping segments in their local indexes after they are handed off to deep storage. I can see the segment in both deep storage location and the realtime index location. While this would normally be a minor annoyance that we’d just clean up with a script, it also appears to cause the realtime node to read in these segments when they are queried. This causes double-counting of the segments and drastically slows down query performance because the broker does not cache segments owned by the realtime node.

As far as I can tell, both sets of segments appear to be identical. I was able to confirm that the total of our ‘count’ query reduced by about half when I deleted about a month’s worth of segments from the realtime index. Is this a known issue? It certainly appears to be pretty serious.

Actually, this is weirder than I thought. It looks like deleting those realtime segments has prevented them from being queried entirely so they weren’t being double-counted. However, the segments do exist in deep storage and in the druid_segments table. Why are they not being picked up by the historical nodes?

Hello,

Please read “My realtime node is not handing segments off” section here - Druid | Documentation and see if it solves your problem. From what I have seen, the most common reason is the 2nd point - Historicals are full and thus unable to load new segments.

Druid | Documentation
Table of Contents
API documentation My Data isn’t being loaded Realtime Ingestion

View on druid.io

Preview by Yahoo

Actually, this is weirder than I thought. It looks like deleting those realtime segments has prevented them from being queried entirely so they weren’t being double-counted. However, the segments do exist in deep storage and in the druid_segments table. Why are they not being picked up by the historical nodes?

What constitutes “full” here? The deepstorage location is not full and the historical nodes have plenty of disk space. I am seeing a lot of this in the coordinator logs though:

io.druid.server.coordinator.rules.LoadRule - Not enough [_default_tier] servers or node capacity to assign segment[requests_2015-12-23T13:00:00.000Z_2015-12-23T14:00:00.000Z_2015-12-23T13:00:00.000Z]! Expected Replicants[2]

You should go to your coordinator console at coodinator_host:port and there you can see how much full your historicals are? The size of segments that historicals can load is configured using “druid.server.maxSize” property (see - http://druid.io/docs/latest/configuration/historical.html)

The log message you posted means either you have only 1 historical and since default replication factor is 2, Druid cannot replicate the segment on two historicals or you have enough historicals but not enough capacity on them so that segments can not be replicated.

No, we actually have 4 historical nodes. However, you are correct about the config for those nodes, our druid.server.maxSize value was too small. I bumped it up and everything is working now!