Druid performance optimization issues

**System **

  • ubuntu 16.04
  • druid 0.10.1
  • hadoop 2.9.0

Hardware

  • cpu*16

  • mem 64g

  • storage 500g(not ssd)
    Distributed

  • master node => node1

  • data node => node2

  • query node => node3

Master Node

Coordinator

jvm.config

-server

-Xmx10g

-Xms10g

-XX:NewSize=512m

-XX:MaxNewSize=512m

-XX:+UseG1GC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/home/druid/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

-Dderby.stream.error.file=/home/druid/derby.log

**runtime.properties **

druid.service=druid/coordinator

druid.host=xxxxxxxx

druid.port=8081

druid.coordinator.startDelay=PT30S

druid.coordinator.period=PT30S

druid.coordinator.merge.on=true

Overlord

jvm.config

-server

-Xmx4g

-Xms4g

-XX:NewSize=256m

-XX:MaxNewSize=256m

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/home/druid/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

**runtime.properties **

druid.service=druid/overlord

druid.host=xxxxxxx

druid.port=8090

druid.indexer.autoscale.doAutoscale=true

druid.indexer.autoscale.strategy=ec2

druid.indexer.autoscale.workerIdleTimeout=PT90m

druid.indexer.autoscale.terminatePeriod=PT5M

druid.indexer.queue.startDelay=PT30S

druid.coordinator.period=PT30S

druid.indexer.runner.type=remote

druid.indexer.storage.type=metadata

Data Node

Historical

jvm.config

-server

-Xmx12g

-Xms12g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=30g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/home/druid/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

**runtime.properties **

druid.service=druid/historical

druid.host=xxxxxx

druid.port=8083

druid.server.tier=hot

druid.server.priority=100

HTTP server threads

druid.server.http.numThreads=45

Processing threads and buffers

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numMergeBuffers=11

druid.processing.numThreads=15

druid.processing.tmpDir=/home/druid/processing

Segment storage

druid.segmentCache.locations=[{“path”:"/home/druid/segment-cache",“maxSize”:300000000000}]

druid.server.maxSize=300000000000

Query cache

druid.historical.cache.useCache=false

druid.historical.cache.populateCache=false

druid.cache.type=caffeine

druid.cache.sizeInBytes=2000000000

MiddleManager

jvm.config

-server

-Xmx64m

-Xms64m

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/home/druid/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

**runtime.properties **

druid.service=druid/middlemanager

druid.host=xxxxxxxx

druid.port=8091

Number of tasks per middleManager

druid.worker.capacity=10

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

druid.indexer.task.baseTaskDir=/home/druid/task

druid.indexer.task.restoreTasksOnRestart=true

HTTP server threads

druid.server.http.numThreads=45

Processing threads and buffers

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=336870912

druid.indexer.fork.property.druid.processing.numThreads=2

druid.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/home/druid/processing”, “maxSize”: 0}]

druid.indexer.fork.property.druid.server.http.numThreads=45

druid.processing.buffer.sizeBytes=100000000

druid.processing.numMergeBuffers=2

druid.processing.numThreads=3

druid.processing.tmpDir=/home/druid/processing

Hadoop indexing

druid.indexer.task.hadoopWorkingPath=/home/druid/hadoop-tmp

druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.3”]

Query Node

Broker

jvm.config

-server

-Xmx20g

-Xms20g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=30g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/home/druid/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

**runtime.properties **

druid.service=druid/broker

druid.host=xxxxxx

druid.port=8082

HTTP server threads

druid.broker.http.numConnections=20

druid.server.http.numThreads=45

Processing threads and buffers

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numMergeBuffers=11

druid.processing.numThreads=15

druid.processing.tmpDir=/home/druid/processing

Query cache disabled – push down caching and merging instead

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.cache.type=memcached

druid.cache.hosts=node1:11211,node3:11211

druid.cache.memcachedPrefix=druid

druid.cache.numConnections=12

druid.broker.select.tier=highestPriority

Cluster

metric monitor historical node query/wait time very slow

Hi, zoucaitou

how many historical node do you use?

i checked your historical node runtime props,

-Xmx12g

-Xms12g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=30g

your server has 64Gb memory and with your historical runtime conf, available memory for segment loading is under 20gb.

if 20gb memory is not enough for serving segments in historical node, there can be page in/out for segments and it can effect to query processing.

Check your total segments size per historical node and available memory for your segments.

good luck :slight_smile:

I think you can add the GC logs configuration in your historical nodes jvm.config file, then test the heap size changes through the gc logs once the queries are coming. guess it maybe not enough young generation heap size, you can increase the -NewSize or -MaxNewSize, or it maybe caused by whole heap size, you can solve it using -Xmx or -Xms.

在 2018年3月1日星期四 UTC+8上午10:53:01,zoucaitou写道: