Druid returning invalid results with no warning, when historical node dies


I am running druid 0.12.2 with a Broker and two Historical nodes, with 2x replication on my datasource.

A “groupBy” query works great, summing up some costs and returning the correct number.

However, if I kill -9 a Historical node java process, then while it is starting back up my “groupBy” query may occasionally return an incorrect sum (e.g. a fraction of the correct value).

But I’m not seeing any indication that the returned sum is untrustworthy! :frowning:

I’m passing {“uncoveredIntervalsLimit”:10} in the query “context”, but the returned X-Druid-Response-Context header is {}, with no “uncoveredIntervals” or “missingSegments” reported.

Is there ANY way for me to know that the query didn’t actually collect data for all of the necessary segments?

I tried bypassing the Broker and querying the Historical node directly (immediately after killing and restarting it), and saw the same behavior while it was starting up. So it could be a “misbehavior” of the Historical node?

Thanks for your help.

I’ll note: In other situations, “uncoveredIntervals” has worked well.
If I simply disable and enable the datasource, then while segments are still loading my query will indicate some “uncoveredIntervals” so I know it’s returning a partial sum.