Guys,
I believe we are just running groupBy queries, but I’m seeing (not totally sure) average performance. I played with the heap space on my historicals and brokers, but I’m not sure if this is doing much. There are number of knobs to tweak that are sort of confusing to me when used with other parameters. Some of those are:
-
memcached connections - What happens if I use one versus 100? I have an ElastiCache cluster.
-
Caching on the broker (read but not write) - My broker reads so does this mean it will intelligently look up cached results before requesting them to be merged by the historicals (read/read)?
-
Here is a particularly confusing one:
druid.segmentCache.locations=[{“path”: “/mnt/persistent/zk_druid”, “maxSize”: 550000000000}]
druid.server.maxSize=550000000000
What is the difference between maxSize and segmentCache locations maxSize? How do I know the proper ratio for server.maxSize? In the docs, it specifies that server.maxSize is the proper RAM to disk ratio, but if too much disk my application will page. I’ve noticed in other persons configurations that server.maxSize and segment locations maxSize are the same size. I thought one was for caching segments, and the other specifies how much is loaded into memory and allows overflow? How do I go about tweaking this when my dataset is in the terabyte level?
- The final confusing thing for me is the whole number of threads versus http connections for brokers and historicals. I seem to be getting a ton of backend connection errors in my Amazon Elastic Load Balancer that sits in front of my broker nodes. Here is the current http / thread config for both broker and historical roles respectively. It would seem that increasing any number of http server, memcached, number of processing threads server connections for my broker has no effect on performance. I know from reading past configuraiton issues that my numThreads*brokers has to be larger than the number of historical connections…? I’ve also scaled out more brokers, but it still appears slow and the performance slightly random if not poor. When I fire up the cluster it would seem it runs fast very briefly, but this could be just the web browser caching. Any suggestions would help. My nodes are all r4 series with greater than 200 GiB of RAM.
HTTP server threads
druid.server.http.numThreads=50
druid.broker.http.readTimeout=PT5M
druid.broker.retryPolicy.numTries=2
Processing threads and buffers
druid.processing.buffer.sizeBytes=2147483647
druid.processing.numThreads=31
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=false
Druid connection balancer type - we choose connectionCount based on fewest number
of active connections
druid.broker.balancer.type=connectionCount
, And here is my historical config:
HTTP server threads
druid.server.http.numThreads=50
Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=31
Query cache (we use a small local cache)
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
Segment storage
druid.segmentCache.locations=[{“path”: “/mnt/persistent/zk_druid”, “maxSize”: 550000000000}]
druid.server.maxSize=550000000000