About the relation between the number of queries run and final query times

Hi!

I’d need to ask you about the following situation related to query times in Druid.

When I run a given query once, it takes about 6.5s - 7s to complete. Nothing to worry about so far.

But when I run that query twice in a concurrent way (by using the Apache HTTP server benchmarking tool (ab), or via curl) I see that no results are returned until ~14s (which is approximately those 6.5s - 7s x 2).

This trend seems to go on as the number of concurrent queries increases. For instance, when the query is run 45 times, every attempt finishes successfully but only after a time of 310s (which is approximately those 6.5 - 7s x 45). I would have never expected this.

Could you please help me to understand this behaviour? Could it be related to the configuration of the services?

Caching is enabled on historicals, but I modified some query context parameters so as not to use cache neither populate it.

Thanks!

Sounds like each query is consuming all resources, so the rest get queued up. How big is your cluster? Resources? CPU/memory? How big is the query in time frame? Does it require all segments or a large portion of segments to resolve it?

Hi Sergio,

Thanks for your reply. At first I thought the same as you (that our cluster can only process one query at the same time, queuing up the others), but in such a case, wouldn’t the results of the first query be returned after ~7s, the results of the second query after ~14s and so on, insted af all results of all queries at once after a long time?

Our cluster is deployed on two EC2 instances, each with 4 vCPU’s and 32 GiB memory.

  • First instance: 1 broker - 1 router - 1 coordinator/overlord - 1 middleManager - 1 historical
  • Second instance: 1 middleManager - 1 historical

The query filters datasources on __time column (datasources were ingested with a query granularity of seconds). The time horizon of this specific query is one month (AND __time BETWEEN TIMESTAMP '2022-04-01 00:00:00' AND TIMESTAMP '2022-04-30 00:00:00').

About the number of segments, the query involves two datasources (eventually merged via LEFT JOIN) with segment granularity of day. Segments of both datasources have been compacted (with a target of ~5 million rows per segment), so the number of segments involved is about ~3% of the total number of segments for each datasource.

Thanks!

Hi Jorge,
How many rows are returned from each of the datasources involved in the join? Druid processes joins in the Broker, and while some operations are done in multiple threads, this limits the parallelism of the join. So with large result sets from each side of the join, this will become a bottleneck.

What are you trying to achieve with this query? Perhaps the community can help with alternatives that don’t require a large join, if that is the issue.
Many implementations tend to do the join (particularly if it is a common query pattern) upstream, before ingestion such that the resulting datasource is pre-joined and can therefore deliver low latency queries.

Hi Jorge,

This is definitely not normal behavior.

Can you try running concurrent queries with no joins and see if this behavior is repeated. You could also paste your parameter files here so we can check if anything jumps out at us.

with 4 VCPs being shared across so many services on a single vm, there may be contentions going on. Can you try to isolate the services into data/query/master and 3 VMs?

It sounds like a resource is pegged, maybe i/o, so if you split it between two queries, it takes each one twice as long. More nodes would help, if so.

Hi Sergio,

Yes, you are right! The bottleneck seems to be due the JOIN part of the query. I have to admit that I had made a huge mistake while writing my docker compose file. I had inadvertently configured the services to use up to 1 vCPU, so the full computing power of both instances had been never used indeed… Now that all vCPU’s are being used, I run a query that did not involve joins and no such behaviour was shown that time.

Answering your questions, left datasource involves 3300 rows and the right one involves 120 (in both cases, after filtering and grouping the original datasource rows). Left join is done using two common columns. We are computing some metrics regarding user behaviour inside a web application. We had been thinking about doing some kind of pre-aggregation before ingestion, so now that you are recommending it, we’ll definitelly give it a try.

Thanks a lot for your help, Sergio

Hi Ben,

Thanks a lot for your hint. After testing I/O using itop, it does not seem to be a problem:

image

Thanks!

Hi Vijeth,

As I told Sergio, I had made a huge mistake while writing my docker compose file. I had inadvertently configured the services to use up to 1 vCPU, so the full computing power of both instances had been never used indeed… Now that all vCPU’s are being used, I run a query that did not involve joins and no such behaviour was shown that time.

I will give a try to isolating services into 3 instances, thanks for the hint. If you don’t mind, let me paste the config parameters of the broker, in case you could spot something misconfigured…

druid.server.http.numThreads=40
druid.broker.http.numMaxThreads=30
druid.broker.http.numConnections=30
druid.broker.http.maxQueuedBytes=100MiB
druid.processing.buffer.sizeBytes=1GiB
druid.processing.numMergeBuffers=4
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
druid.broker.cache.useResultLevelCache=false
druid.broker.cache.populateResultLevelCache=false
druid.server.http.enableRequestLimit=true
druid.server.http.maxSubqueryRows=20000000
druid.server.http.defaultQueryTimeout=600000
druid.broker.http.unusedConnectionTimeout=PT27M
druid.broker.http.readTimeout=PT30M
druid.server.http.maxIdleTime=PT10M
druid.broker.retryPolicy.numTries=2
druid.request.logging.type=slf4j
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor", "org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor"]

Does it make sense to set druid.server.http.numThreads, druid.broker.http.numMaxThreads, druid.broker.http.numConnections and druid.processing.numMergeBuffers to a number higher than the number of vCPUs in the instance? It is something that was unclear to me after reading the docs.

Thanks a lot for your help, Vijeth

I think you set numThreads=2 & numMergeBuffers=2

if you have more threads, it will run concurrently and the bottleneck will be the cpu/disk, instead of soft limit (1) , as you have lot of ram a few cpu, you can increase both parameters to increase parallelism, all queries will be slower, but that not should be N x (1 query) like you are seeing now