Broker query timeout on production cluster

Hello !

I struggle to understand why some queries timeout on our production druid cluster v0.22.1. My query looks like this :

SELECT
  company,
  MIN(__time)
FROM my_data_source
WHERE symbol IN (‘ABC’, ‘DEF’)
GROUP BY company

Other queries that should run quickly take a long time, or timeout.

When issued from the druid console, I receive a timeout with no details. If issued to a broker directly with HTTP, I receive a timeout message :

{
  "error" : "Query timeout",
  "errorMessage" : "Query […] timed out!",
  "errorClass" : "org.apache.druid.query.QueryTimeoutException",
  "host" : "172.30.107.201:8088"
}

The host in the error message is a historical node. However, when I query this historical node, results come back instantly. I tried querying each historical nodes with that same query, and results always come <10sec.

Additional info that can help :

  • Data source size : 12 TB
  • Average Segment size 500MB
  • All data sources are compacted
  • Historical : druid.server.http.numThreads=60
  • Broker :
druid.broker.http.numConnections=25
druid.broker.http.maxQueuedBytes=20MiB
druid.processing.buffer.sizeBytes=500MiB
druid.processing.numThreads=1
druid.processing.numMergeBuffers=2
-Xmx14G
-Xms14G
-XX:MaxDirectMemorySize=2G
  • No cache configured
  • Cluster running on kubernetes, 30 nodes (9 Historical, 10 middle managers, 2 overlords, 2 coordinators, 2 brokers, 2 routers, 3 zoo keepers)

What steps would you suggest identify the source of those timeouts ?

Thanks in advance !

Relates to Apache Druid 0.22.1

What number of segments are scanned in that query, out of interest?

I was wondering if the issue goes away, for example, if you added time bounds to the query?

On your WHERE clause, is that column indexed? Perhaps also some secondary partitioning on that column might help the pruning?

Also, if the issue is not apparent on Historicals (though I would like them to come back in <1s :D) then maybe this is an issue with your Coordinator somewhere as that would be the next bottleneck up.

What number of segments are scanned in that query, out of interest?

How can I get this information and tie it to that specific query ? On prometheus I have a druid_query_segment_time_count metric but it shows either 0 or a value close to the total number of segments we have (150K)

I was wondering if the issue goes away, for example, if you added time bounds to the query?

If I add a small time bound, there is no longer the timeout, but it still takes a long time (15sec for last 2 months of data)

On your WHERE clause, is that column indexed? Perhaps also some secondary partitioning on that column might help the pruning?

If during ingestion, we store everything as dimensions (no rollup for now), then does it mean that all columns will be indexed and used for partitioning ?

(Also, if we have a column that is a metric, but where we don’t want rollup, should it still be defined as a metric in the ingestionSpec ? If so, I believe we must provide an aggregation function for rollup)

Also, if the issue is not apparent on Historicals (though I would like them to come back in <1s :D) then maybe this is an issue with your Coordinator somewhere as that would be the next bottleneck up.

I see here that the coordinator is not involved in queries, why do you think it could be a bottleneck ?

In order to dismiss some investigation paths, I managed to narrow down to an even simpler query that time outs :

SELECT MIN(__time) FROM my_table

Shouldn’t this complete immediately ? (On a data source of 1TB, 18K segments, 90 billion rows, ~90MB per segment, suboptimal, but I believe this shouldn’t be the problem)

Thanks a lot for your help @petermarshallio, really appreciate !

Hi data, welcome to the Druid Forum.

A few questions if you don’t mind so we can try and debug this situation:

  • How did you query the historical? curl?
  • Do these queries time out even when the cluster is idle?
  • What is the partitioning for this datasource?
  • How many cores have you allocated to the historicals?
1 Like

My mind went straight to the number of cores for historicals as well! @data each segment is processed by one thread – so if you execute a query that parses 150k segments, that will require 150k parallel tasks to execute plus merge time to give your results back. If your average segment scan time is, say, 5 seconds, then you can see that for a query speed of 5 seconds, you need 150k threads to execute in parallel at once.

Not at all. Druid will always partition by __time first. For an additional partition, you must specify that when you ingest / reindex / etc:

And see createBitmapIndex here on indexes:

The metricsSpec is where you can state what aggregations you want the rollup to produce based on the values of columns in your dimensionsSpec. If you’re not using rollup, you would therefore tend not use the metricsSpec.

I definitely recommend rollup – you can think of it as front-loading any query where you’re going to use a time truncation in your GROUP BY. Not only does it massively reduce your storage needs, it also massively decreases scan times when Druid processes individual rows, resulting in huge performance gains. Sometimes people have supercharged fast lean tables with rollup for interactive UIs, for example, and then have another table (maybe with a longer retention period) that they’re happy for queries to take ages to run on.

OMG I’m such a fool. I didn’t mean coordinator, I meant broker :smiley:

1 Like