Hi All,
I am trying to configure the druid cluster for optimum performance of handling a concurrent load of around 5000 requests per second.
The queries are mostly groupBy and Time-series queries which single queries will return response in the range of 500ms to 2s.
But when I start increasing the number of requests beyond 1000 rps ,
the cpu spikes to 90%, the performance degrades drastically along with failed responses.
Please review these config parameters and let me know if the concurrency can be improved by tweaking in any params, currently the cluster works pretty well with a few hundred rps.
These are my configurations:
Data (3 nodes: 72 cores,144 GB RAM, 2TB gp2 HD)
Queries(2 nodes: 16 cores 122 GB RAM)
middleManager/runtime.properties:
druid.worker.capacity=31
Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
HTTP server threads
druid.server.http.numThreads=1100
Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=8
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1000000000
druid.indexer.fork.property.druid.processing.numThreads=3
druid.indexer.runner.javaOptsArray=["-server","-Xmx3g","-XX:MaxDirectMemorySize=15G"]
Hadoop indexing
druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp
druid.query.search.maxSearchLimit=1000000000
druid.query.groupBy.maxMergingDictionarySize=100000000
druid.query.groupBy.maxOnDiskStorage=1000000000
historical/runtime.properties
druid.service=druid/historical
druid.plaintextPort=8083
HTTP server threads
druid.server.http.numThreads=1100
Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=40
druid.processing.numThreads=80
druid.processing.tmpDir=/data/var/druid/processing
Segment storage
druid.segmentCache.locations=[{“path”:"/data/var/druid/segment-cache",“maxSize”:150000000000}]
druid.server.maxSize=1500000000000
Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=512000000
druid.query.search.maxSearchLimit=1000000000
druid.query.groupBy.maxMergingDictionarySize=100000000
druid.query.groupBy.maxOnDiskStorage=1000000000
druid.server.http.maxSubqueryRows=2000000000
broker/runtime.properties
druid.service=druid/broker
druid.plaintextPort=8082
HTTP server settings
druid.server.http.numThreads=120
HTTP client settings
druid.broker.http.numConnections=500
druid.broker.http.maxQueuedBytes=100000000
druid.server.http.defaultQueryTimeout=3600000
Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=40
druid.processing.numThreads=15
druid.processing.tmpDir=/data/var/druid/processing
Query cache disabled – push down caching and merging instead
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.sql.planner.sqlTimeZone=IST
druid.sql.planner.maxTopNLimit=1000000000
druid.sql.planner.metadataSegmentCacheEnable=true
druid.broker.cache.useResultLevelCache=true
druid.broker.cache.populateResultLevelCache=true
druid.query.search.maxSearchLimit=1000000000
druid.query.groupBy.maxMergingDictionarySize=100000000
druid.query.groupBy.maxOnDiskStorage=1000000000
Regards,
Kundan