Druid query performance tuning

Hi everyone,

We’ve managed to launch a functioning Druid cluster but we believe we may have some performance issues because of high query latency (some take even more than 20 seconds). Not only this but lately we’ve been getting Query Errors because of timeouts.

Our cluster consists of:

Broker: 1 r3.xlarge

Middle Manager: 1 r3.xlarge

Historical: 1 m3.2xlarge

Overlord, Coordinator and Zookeeper: 1 m3.2xlarge

We have deep storage in s3.

Parallelly, we use an EMR cluster for batch ingestion and an m3.xlarge using Plywood to manage authentication and data restriction.

Our case of use is currently quite simple: Every day we ingest last day’s data into Druid at 2:00 AM, and each day interval consists of about 700 MB of data divided in 7 shards of about 100 MB. Recently I saw that each segment should be between 300 and 700 MB. Now, I believe that we may have messed up in this part, since in the Coordinator console, I can see that each interval consists of 700 MB divided in 7 segments, making each of our segments about 100 MB, when they should be much larger. Therefore, to change this:

1) Should I change the interval in our ingestion tasks so that each segment occupies about 700 MB instead of 100 from now on?

2) If yes, how can this be done for past data?

I think that this change should improve the performance but I want to make sure. On the other side, we may have some sub-optimal configuration in the nodes that take part in queries (Broker and Historical) that make queries perform poorly:

common.runtime.properties:

druid.extensions.loadList=[“druid-kafka-eight”, “druid-s3-extensions”, “druid-histogram”, “druid-datasketches”, “druid-lookups-cached-global”, “mysql-metadata-storage”]

druid.zk.service.host=druid-coordinator.X.production.internal

druid.zk.paths.base=/druid

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://druid-production.X.X.rds.amazonaws.com/druid

druid.metadata.storage.connector.host=druid-production.X.X.rds.amazonaws.com

druid.metadata.storage.connector.user=X

druid.metadata.storage.connector.password=X

druid.storage.type=s3

druid.storage.bucket=smadex.druid

druid.storage.baseKey=druid/segments

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=smadex.logs

druid.indexer.logs.s3Prefix=druid/indexing-logs

druid.selectors.indexing.serviceName=druid/overlord

druid.selectors.coordinator.serviceName=druid/coordinator

druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.emitter=logging

druid.emitter.logging.logLevel=debug

druid.query.groupBy.maxIntermediateRows=50000000

druid.query.groupBy.maxResults=500000000

druid.manager.lookups.period=180000

druid.manager.lookups.hostUpdateTimeout=PT120s

druid.manager.lookups.updateAllTimeout=PT360s

druid.segmentCache.locations=/mnt/historical

Historical:

jvm.config:

-server

-Xms8g

-Xmx8g

-XX:MaxDirectMemorySize=4096m

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

runtime.properties:

druid.service=druid/historical

druid.port=8083

druid.server.http.numThreads=25

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=7

druid.segmentCache.locations=[{“path”:"/mnt/historical",“maxSize”:130000000000}]

Broker:

jvm.config:

-server

-Xms24g

-Xmx24g

-XX:MaxDirectMemorySize=4096m

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

runtime.properties:

druid.service=druid/broker

druid.port=8082

HTTP server threads

druid.broker.http.numConnections=5

druid.server.http.numThreads=25

Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=7

Query cache

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.cache.type=local

druid.cache.sizeInBytes=2000000000

3) Can something be tweaked from the previous config to improve this situation? We’re sure we’re doing something wrong but we’re not exactly sure what.

Any help is greatly appreciated, we’ve seen Druid’s potential and we know it can get a lot better with the right tuning.

Thanks!

Any thoughts on this?

Joan