Hi everyone,
we use Druid quite intensively and heavily rely on its result, but from time to time the same query is returning different result. We have a cluster of ~30 instances and sometimes we need to restart historical process in different instances.
I experienced a weird behavior of Druid while restarting historical nodes: it is returning a partial result until all nodes (that have segments for the requested data) are up and running.
I mean having partial result is obvious in that situation, but I would expect an empty result or an error until the entire cluster is up and stable again.
In my investigation I was forcing Druid to not use cache and during restarting all historical nodes I was firing the same query multiple times. Those are the results that I got:
query_1
// If Druid is not stable should return an empty result
query_2
// If Druid is not stable should return an empty result
query_3
“version”: “v1”,
“timestamp”: “2017-08-01T00:00:00.000Z”,
“event”: {
“requests”: 1522346231, // Druid starts to return incomplete result
“imps”: 3502031,
“clicks”: 13745
}
query_4
“version”: “v1”,
“timestamp”: “2017-08-01T00:00:00.000Z”,
“event”: {
“requests”: 16895798204, // all metrics are increasing
“imps”: 34952339,
“clicks”: 128912
}
query_5
“version”: “v1”,
“timestamp”: “2017-08-01T00:00:00.000Z”,
“event”: {
“requests”: 17529400203, // still increasing
“imps”: 36228979,
“clicks”: 133302
}
query_6
“version”: “v1”,
“timestamp”: “2017-08-01T00:00:00.000Z”,
“event”: {
“requests”: 17529400203, // from now on Druid is stable and it returns the correct result
“imps”: 36228979,
“clicks”: 133302
}
query_7
“version”: “v1”,
“timestamp”: “2017-08-01T00:00:00.000Z”,
“event”: {
“requests”: 17529400203, // from now on Druid is stable and it returns the correct result
“imps”: 36228979,
“clicks”: 133302
}
``
Does anyone know why this is happening?
Is there a way to configure Druid to avoid that behavior?