Data not available after indexing due to 30-40 mins coordinator cycle to load segments

Hello all,

This is an issue with a cluster (v0.22.0) with over 500K segments. Coordinator node has 8 cores and 64 GB.

When we have the retention for 1 month, all looks fine with querying the recent data. However when we increase the retention to 3 month or more data from last hour takes a lot of time to show up in query results. Also in coordinator logs we see the Load queue frequency changes from each 10 mins to 30-40 mins.

So basically the query result would show normal values for real-time then low numbers for an hour ago then normal numbers again.

Clearly this something has to do with the number of segments. And here what we’ve tried.

  • We double checked communicatio with ZK, metada storage, issues with Java heap on coordinator and all looks okay.
  • We experminted with the following params with no luck: druid.segmentCache.numLoadingThreads, maxNumConcurrentSubTasks , percentOfSegmentsToConsiderPerMove, maxSegmentsInNodeLoadingQueue

Here are the coordinator dynamic confings:

{
  "millisToWaitBeforeDeleting": 900000,
  "mergeBytesLimit": 524288000,
  "mergeSegmentsLimit": 100,
  "maxSegmentsToMove": 5,
  "percentOfSegmentsToConsiderPerMove": 10,
  "useBatchedSegmentSampler": false,
  "replicantLifetime": 1,
  "replicationThrottleLimit": 200,
  "balancerComputeThreads": 12,
  "emitBalancingStats": false,
  "killDataSourceWhitelist": [],
  "killAllDataSources": false,
  "killPendingSegmentsSkipList": [],
  "maxSegmentsInNodeLoadingQueue": 2000,
  "decommissioningNodes": [],
  "decommissioningMaxPercentOfMaxSegmentsToMove": 70,
  "pauseCoordination": false,
  "replicateAfterLoadTimeout": false,
  "maxNonPrimaryReplicantsToLoad": 2147483647
}

Relates to Apache Druid 0.22.0

Hi @Ahmad_Eldefrawy. When you changed your retention period, did you also drop the segments which didn’t conform to the new retention period? Without a drop rule, data not within the specified period will be retained forever (if that’s your default rule), and that might account for the increase in your load queue frequency.

Hello @Mark_Herrera , Thanks for getting back. Yeah it’s usually loading last 1-3 months rule + dropping for ever rule.

Hi Ahmad,

What is the heap setting on your Coordinator/Coordinator? It is recommended to scale it as the number of segments increases. You can use the cluster tuning recommendation as a starting point:

@Vijeth_Sagar Hi, thanks for your response. We did have them scaled once before.

Here are the numbers:

Coordinator
-Xmx25g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Historical
-Xmx25g -Xms10g -XX:NewSize=8g -XX:MaxNewSize=8g -XX:MaxDirectMemorySize=20g -XX:+ExitOnOutOfMemoryError -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Thanks Ahmad,

A few things you can try to solve this situation:

  • 25GB is high for Coordinator but we can leave it as for now. But please do set xmx and xms to the same value
  • Can you set the value of druid.segmentCache.numLoadingThreads in your historical to a larger value? It should be number of (cores/6) in your case so change it say 50
  • Can you test if changing druid.coordinator.balancer.strategy to ‘random’ helps the load speed?

Hello @Vijeth_Sagar
We are trying your recommendations now. May I ask why you recommended 50 for druid.segmentCache.numLoadingThreads. Isn’t this too high?

Much appreciated.

HI Ahmad,

That (cores/6) calculation is the default value for that parameter: But we are increasing it to test if the bottleneck is in the historicals pulling segments from Deep Storage or if the coordinator is taking too long to issue the pull command.

You can reduce it once your latency issue is solved. In my opinion, I think the balancer strategy may help in which case, you can bring this one down once the cluster is behaving as you want it to.