Return patial result when some of historical nodes crashed

Hi guys, as my historical nodes deployed on ec2 spot instance, sometimes they are recycled and need some time to recover. At that time some query covering different tiers of data nodes may return partial result. For example , query for getting minimum __time may return timestamp which is later because cold tier of data nodes crashed. I wonder if broker can return “query error” but not return partial result

Is Broker tiering a possibility for your use case?

I did have cold tier and hot tier. Sometimes the historical nodes of cold tier crash, and broker return partial result

You can set a query context parameter of uncoveredIntervalsLimit.
When this parameter is set, if you issue a query for, e.g. 2022-10-01/2022-10-01 and the data for day 2022-10-10 is missing, then 2 HTTP headers will be included on the response: uncoveredIntervals: "2022-10-10T00:00:00.000Z/2022-10-11T00:00:00.000Z and uncoveredIntervalsOverflowed: false.

You could also check a coordinator API that can be used to understand the availability of data documented at API reference · Apache Druid. This can be checked (For sensitive queries) to make sure the datasource is loaded 100% before issuing queries.

You can also set druid.segmentCache.numBootstrapThreads to (#Cores -1) and enable better parallelism out of the startup process and get those datasources to 100% quicker.

2 Likes

Thx! But how can I integrate your advice with Imply. I mean if there is any way to make Imply give me some alert or tip that the query result is partial

Hi @Ze_yu_Huang. Are you using an Imply distribution?

I am using open-source imply which is installed on my local machine. And my druid cluster is deployed on k8s

Thank you for the clarification.

Do u have any advice that i can do with imply? I can not find any useful setting with imply to show me that the query result is partial when historical node crush down. When this happens, my customer always doubts that our cluster loss data

You could check the API reference I pasted above (Specifically: /druid/coordinator/v1/loadstatus at the cluster level or /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval} at the datasource level ) which is available in OSS druid, before issuing those high priority customer queries to make sure that the datasource is available 100% before issuing the actual query.

Please note that this can be noisy for example when the segment has been ingested but not yet loaded to the datasource.

1 Like

Thanks for your reply. And another question is that can I use view manager extension to fast the query that may include few dimensions but span several month。