Segment load scaling

HI guys,

I’ve the below setup with Druid cluster.
2 Coordinator and overlords (m5.4xlarge)
3 Query servers
25 Data servers (i3en.24Xlarge)

Coordinator config:

druid.service=druid/coordinator
druid.plaintextPort=8081

druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S

#Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=false
#druid.coordinator.asOverlord.overlordService=druid/overlord

druid.indexer.queue.startDelay=PT5S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
druid.indexer.storage.recentlyFinishedThreshold=PT15M
druid.indexer.runner.pendingTasksRunnerNumThreads=50
druid.coordinator.loadqueuepeon.http.batchSize=10
druid.manager.config.pollDuration=PT10M

Historical config:

druid.service=druid/historical
druid.plaintextPort=8083

#HTTP server threads
druid.server.http.numThreads=150

#Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=/mnt/disk2/var/druid/processing

#Segment storage
druid.segmentCache.locations=[{“path”:"/mnt/disk2/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk3/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk4/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk5/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk6/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk7/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk8/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk9/var/druid/druidSegments", “maxSize”: 6500000000000}]
druid.server.maxSize=50000000000000

#Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000
druid.segmentCache.numLoadingThreads=50

My ingestion jobs would be generating 7k segments in every 15 mins. I was able to scale my ingestion jobs but segments taking lot of time while being available. I am using S3 as deep storage.
Currently its loading only 320+ segments/minute.

What can be done to speed up the segment load? Any idea?

what is the average size of your segments?

20-30 MB is the avg size, its S3 parquet

can you change the “Max segments to move” in the coordinator dynamic config to 500 and try (it is 5 by default)?

Where can I change it?In Druid UI u mean? Can you help me where do I change this> I cant find this config

I was able to change it, is there a way to monitor how many segments are getting loaded per minute?

Does the coordinator load the segments 1 by 1 or in parallel? I am using 0.19.0 version of druid

the coordinator assigns segments to the historical load queue and the historicals will separately connect to deep storage and fetch the segments. So it is parallel.

set druid.segmentCache.numLoadingThreads on the historicals to a larger number (it is by default max(1,Number of cores / 6)). Each thread will separately connect to deep storage and fetch the segments.

I changed the below config -
maxSegmentsToMove - 1000
replicationThrottleLimit - 1000
balancerComputeThreads - 100

But I am still seeing the unavailable segments number goes down by 8000 in an hour.

I am also noticing that Coordinator is assigning only around 250 segments to load in a minute, it seems to be the bottleneck.

See the below log where it print in log only one thread [Coordinator-Exec–0]T
2021-07-05T11:20:58,541 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning ‘replica’ for segment
2021-07-05T11:20:58,783 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning ‘primary’ for segment
2021-07-05T11:20:59,026 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning ‘replica’ for segment
2021-07-05T11:20:59,268 INFO [Coordinator-Exec–0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning ‘primary’ for segment

druid.segmentCache.numLoadingThreads = this is already I have changed it to 50, but as mentioned above now that Coordinator itself is not assigning than how historical will load faster

can you check the coordinator log for something like “org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found 951 segments in the database”.

Also how many tasks are you running each data node? If the tasks take up all the threads then the historicals may not get enough threads to use for the download.

balancer strategy was the culprit, I changed it to random instead of cost and my 1.5 lac segments loaded in 20 mins