80GB data in cluster has a poor concurrent query performance with 12 historical node(each node has 64 core and 256GB memory and 28TB SSD)

Hello everyone, I had met a huge performance problem and need some help.

My druid cluster:

1 broker node:

32 core 64GB memory and 500GB SSD

Jvm ops: -Xms 8g -Xmx 8g

Runtime configuration:

druid.service=druid/broker

druid.port=8082

HTTP server threads

druid.broker.http.numConnections=2000

druid.server.http.numThreads=1000

druid.broker.http.compressionCodec=identity

Processing threads and buffers

druid.processing.buffer.sizeBytes=1000000000

druid.processing.numThreads=31

druid.processing.numMergeBuffers=16

Query cache

druid.broker.cache.useCache=false

druid.broker.cache.populateCache=false

druid.cache.type=local

druid.cache.sizeInBytes=0

druid.sql.enable = true

12 historical nodes:

64 core and 256GB memory and 28TB SSD

Jvm ops: -Xms 8g -Xmx 8g

Runtime configuration:

druid.service=druid/historical

druid.port=8083

HTTP server threads

druid.server.http.numThreads=200

Processing threads and buffers

druid.processing.buffer.sizeBytes=1000000000

druid.processing.numThreads=31

druid.processing.numMergeBuffers=16

druid.historical.cache.useCache=false

druid.historical.cache.populateCache=false

druid.historical.cache.unCacheable=[“select”]

druid.cache.sizeInBytes=4073741824

Segment storage

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:25000000000000}]

druid.server.maxSize=25000000000000

6 middleManager nodes:

64 core 128GB memory 800GB SSD

Runtime configuration:

druid.service=druid/middleManager

druid.plaintextPort=8091

Number of tasks per middleManager

druid.worker.capacity=20

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xms1g -Xmx8g -XX:MaxDirectMemorySize=10240g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

druid.indexer.task.baseTaskDir=var/druid/task

HTTP server threads

druid.server.http.numThreads=50

Processing threads and buffers on Peons

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1000000000

druid.indexer.fork.property.druid.processing.numThreads=7

druid.indexer.fork.property.druid.processing.numMergeBuffers=4

druid.indexer.task.restoreTasksOnRestart=true

Hadoop indexing

druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp

druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.3”]

druid.peon.defaultSegmentWriteOutMediumFactory.type=offHeapMemory

Now, there is 80GB data in my cluster, When I performed concurrent query with 20 threads, the response time nearly needs 3s~5s. My datasource has 24 dimensions and 23 metrics and my query type are groupBy, topN and timeseries. I do not have many small size segments, my segments size are 300~700MB.

The response time of those 20 queries like:

effect rows: 32, duration: 2.621171503s

effect rows: 5, duration: 2.838283915s

effect rows: 126, duration: 2.992462027s

effect rows: 5, duration: 3.237546923s

effect rows: 157, duration: 3.276105989s

effect rows: 5, duration: 3.465969717s

effect rows: 1, duration: 3.540605852s

effect rows: 15, duration: 3.862869119s

effect rows: 96, duration: 3.958490223s

effect rows: 1, duration: 4.199707517s

effect rows: 32, duration: 4.242192788s

effect rows: 32, duration: 4.39741068s

effect rows: 32, duration: 4.413618751s

effect rows: 32, duration: 4.409406441s

effect rows: 32, duration: 4.416742285s

effect rows: 1, duration: 4.490025544s

effect rows: 1, duration: 4.496389513s

effect rows: 5, duration: 4.574033371s

effect rows: 32, duration: 4.586916513s

effect rows: 32, duration: 4.704293055s

The metric ‘query/wait/time’ of the historical nodes was a few seconds and also the middleManager nodes.

The metric ‘segment/scan/pending’ of the historical nodes was 120~180 on some hosts but was 0 of the middleManager nodes.

My questions:

  1. I have a total of 12 * 256GB memory on historical nodes, my cluster data was only 80GB, I think there is enough memory to keep segment into historical node memory but why some historical nodes have a long time to wait for scan segments?

  2. The cluster data is so small, why 20 concurrent queries cause historical nodes 31 cores used 100%?

  3. How do I config my cluster can support 20 or more concurrent queries? does druid native not support high concurrency?

  4. If there are so many 300~700MB segments in the cluster, the query performance will be poorer because of the CPU cores limit?

Thanks a lot!

Have a nice day!

Hi Zhou:

To answer your questions

  1. Yes, Druid can utilize the memory by storing some past query results there, so when there is a same query covering same interval, these cached results can be used to boost performance. However, in your historical running config, druid.historical.cache.useCache=false.

  2. Looks like the historicals are fully loaded per your description, although you allocated 8GB for their heap. But historicals also use off-heap memory to store intermediate results, and by default, all segments are memory mapped before they can be queried. What are your other configs on the jvm.config ? ie, the maxDirectMemory setting etc.

Your druid.server.http.numThreads=200 is also way too high. Why is that ? You probably also don’t need 31 processing threads, unless you absolutely run nothing else on historicals.

  1. your segment size are perfect for optimized queries. So they are good.

Other performance impact are like column cardinality, your query etc.

Hope this helps.

Hi Zhou, what version of Druid are you running?

Thanks for the quick reply!

Druid version is 0.14.2.
Is there any known issue cause this performance problem?

在 2019年9月13日星期五 UTC+8上午6:48:51,Vadim Ogievetsky写道:

Thanks for the quick reply!

I purposely turned off the historical cache because I want to know the query performance of druid without cache.

The historical JVM configuration:

-server

-Xms8g

-Xmx8g

-XX:MaxDirectMemorySize=10240g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

I think the max direct memory is enough.

Set ‘druid.server.http.numThreads=200’ in order to improve concurrency on historical nodes. Does it have a side effect? Can you explain to me more detail about it?

Also, Can you help me to correct my configuration?

Thanks a lot!

在 2019年9月13日星期五 UTC+8上午6:44:18,Ming F写道: