Druid queries are very slow. What's the proper cluster configuration for faster queries?

Hi,

I have a Druid cluster set-up on AWS with 10 historicals (EC2 instance type i3.8xlarge) and 3 brokers (EC2 instance type is r5.12xlarge). Have a data source in Druid with around 3.5 TB data.

While running queries over the entire data, the queries are very slow (takes at times 5+ minutes to return the response). Segment caching is enabled on historical nodes.

For such scale of data, do I need to add more historical/broker nodes for faster query performance? What will be the optimal cluster configuration for this size of data?

Below is the Historical node configuration (derived as per the recommendation from Druid configuration documentation):

druid.service=druid/historical
druid.port=8083

# HTTP server threads
druid.server.http.numThreads=66

# Processing threads and buffers
druid.processing.buffer.sizeBytes=2147483647
druid.processing.numThreads=31
druid.processing.numMergeBuffers=8

# Segment storage
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.segmentCache.locations=[{"path":"{{location-to-segemnt-cache}}","maxSize":6000000000000}]
druid.server.maxSize=6000000000000

druid.query.groupBy.maxMergingDictionarySize=1000000000
druid.query.groupBy.maxOnDiskStorage=2000000000

Below is the Broker ** node configuration** (derived as per the recommendation from Druid configuration documentation):


druid.service=druid/broker
druid.port=8082

# HTTP server threads
druid.broker.http.numConnections=40
druid.server.http.numThreads=83
druid.broker.http.readTimeout=PT5M

# Processing threads and buffers
druid.processing.buffer.sizeBytes=2147483647
druid.processing.numThreads=47
druid.processing.numMergeBuffers=12

# Query result cache
druid.cache.type=caffeine
druid.broker.cache.useResultLevelCache=true
druid.broker.cache.populateResultLevelCache=true
druid.broker.cache.resultLevelCacheLimit=3145728
druid.broker.cache.unCacheable=[]
druid.cache.sizeInBytes=5368709120

# SQL properties
druid.sql.enable=true
druid.sql.http.enable=true

# Group by properties
druid.query.groupBy.defaultStrategy=v2
druid.query.groupBy.maxMergingDictionarySize=1000000000
druid.query.groupBy.maxOnDiskStorage=2000000000

``


Thank you in advance.

Regards,
Vinay

Hi Vinay,

You can take a look at this thread: https://groups.google.com/forum/#!topic/druid-user/Va7ZLVzax7M

I was facing issues with query speed on my cluster which had relatively less data. You will find links to couple of other threads which i had started. you’ll have a better understanding of the problem once you go through them.

-Prathamesh

Thank you Prathamesh. I will go through the discussions on these threads.

Regards,

Vinay Patil