Realtime nodes in our system are keeping segments in their local indexes after they are handed off to deep storage. I can see the segment in both deep storage location and the realtime index location. While this would normally be a minor annoyance that we’d just clean up with a script, it also appears to cause the realtime node to read in these segments when they are queried. This causes double-counting of the segments and drastically slows down query performance because the broker does not cache segments owned by the realtime node.
As far as I can tell, both sets of segments appear to be identical. I was able to confirm that the total of our ‘count’ query reduced by about half when I deleted about a month’s worth of segments from the realtime index. Is this a known issue? It certainly appears to be pretty serious.