How do I optimize/tune druid to account for high number of parallel query requests

Hi All,

We have a query that takes about 2 seconds to respond back 95% of the time. We also have a load balancer that does a round robin on 3 brokers.

We started to run some jmeter performance tests and we are seeing that the response time for the query is ticking up towards the 1 minute range as we increase the number of parallel requests being made. How do I configure/tune druid to get the response time back down close to 2 seconds when we start bombarding the system with parallel queries?

Here is the query that we are using:
SELECT advertiser_id AS “advertiser_id”, LOOKUP(CONCAT(campaign_id, ‘’), ‘dsp_media_and_bids_insertion_order_id’) AS “insertion_order_id”, country_id AS “country_id”, dma_id AS “dma_id”, (SUM(impression)) AS “impression”, (SUM(click)) AS “click”, (SUM(action)) AS “action”, ((SUM(click) * 100.0) / CAST(SUM(impression) AS DOUBLE)) AS “ctr”, (SUM(action) / CAST(SUM(impression) AS DOUBLE)) AS “ar”, (SUM(cost) * 1000.0 / CAST(SUM(impression) AS DOUBLE)) AS “advertiser_ecpm”, (SUM(cost) / CAST(SUM(click) AS DOUBLE)) AS “advertiser_ecpc”, (SUM(cost) / CAST(SUM(action) AS DOUBLE)) AS “advertiser_ecpa”, (SUM(cost)) AS “cost”, (SUM(complete_events) / CAST(SUM(impression) AS DOUBLE)) AS “complete_events_rate”, (APPROX_COUNT_DISTINCT_DS_HLL(user_id_sketch)) AS “user_id_sketch”, (APPROX_COUNT_DISTINCT_DS_HLL(household_person_id_sketch)) AS “household_person_id_sketch”, (APPROX_COUNT_DISTINCT_DS_HLL(household_id_sketch)) AS “household_id_sketch” FROM dsp_media_and_bids WHERE market_id IN () AND __time >= ‘2021-03-01 00:00:00’ AND __time <= ‘2021-03-31 23:59:59’ GROUP BY dma_id, country_id, advertiser_id, LOOKUP(CONCAT(campaign_id, ‘’), ‘dsp_media_and_bids_insertion_order_id’)

We pivot on the market_id for our jmeter tests. Here are our jmeter test results
10 threads with loop count of 100 - 20 seconds in 95th percentile
20 threads with loop count 100 - 23 seconds in 95th percentile
50 threads with loop count 100 - 39 seconds in 95th percentile
100 threads with loop count 100 - 54 seconds in 95th percentile

Our machines are spec’ed at 80 cores and 375 GBs of memory a piece.
Each of our 3 brokers have the following runtime configs settings:

Is there a config I should be looking at that I am missing to help increase the parallel processing of queries to make them faster?

Relates to Apache Druid 0.20

1 Like

Hey @kpeng1, the way I’d look at this is to think about 3 aspects:

1. Is the bottleneck in Druid or others (such as the LB etc.)?
I am not sure if you have a way to breakdown that 20-54 seconds of time reported by jmeter in terms of how much of that was spent on network, such as between the client to load balancer, and load balancer to druid, and even within Druid (i.e. between router, broker and historicals). Another thing you can check is the broker or router log, is the query time reported there more or less the same with the timing you are seeing from jmeter?

2. Can the data source config be optimized
Looks like you are doing a filter on market_id. Is that always the case? If so, you may be able to benefit from doing a hash/single-dim partition with market_id.

3. Can the Druid cluster config be optimized
In order to do this, we need to first understand what aspect the historicals are spending time on processing those queries. Are they spending majority of the time on IO, or simply waiting for CPUs to be freed up? This post for example detailed how to generate flame graphs and analyze them in order to optimize Druid queries.

Hope this helps!