How does broker avoid loss of data when it combines historical data and realtime data?

Hi All,

I am not sure whether there are some chance that broker can not return whole data when it combines historical data and realtime data.

There is a case:

Step 1: In initial status, Middle Manager(MM1) holds 2 segments: {S1, S2}, and Historical Node(HN1) holds 1 segment: {S3};

Step 2: One query is coming to broker, broker reads ZK and know that MM1 holds: {S1, S2}, HN1 holds: {S3}. So Broker separately sends request to MM1 and HN1 to read segments;

Step 3: Before requests arrive to MM1 and HN1, S2 has been hand-off from MM1 and HN1l, so it changes to: MM1->{S1}, HN1->{S2, S3};

Step 4: After step 3, requests arrive to MM1, HN1. MM1 only return S1 although it is requested to S1, S2 by broker. And HN1 only returns S3 because Broker supposes only S3 is in HN1.

Step 5: So, only S1, S3 return back to Broker, S2 is lost by this query.

Is it possible to be happened?


if Step 3 happens, the broker should resend the query to HN1 for missing segments.

Do you see any incorrect result?


Not yet, I try to design a test case to verify it, but it is a little difficult.

So, I want to know whether Druid considers this case.

在 2019年3月23日星期六 UTC+8上午4:00:00,Jihoon Son写道: