How many HTTP requests does a broker make to a single historical for a single query?

A broker gets a query (say, a groupby query) whose interval covers a large number of segments. It sends queries to historicals and merges the results together.

Consider a single historical, which has many (N) segments which the broker needs to query. Does this turn into a single HTTP query from the broker to the historical that covers all of the segments? Or does it turn into N parallel queries?

We’re seeing a situation where a single large query over a wide time interval to a broker can cause many of our historicals to fail to respond on /status (which we are using as a Kubernetes liveness probe) for long enough that Kubernetes kills it. (We should probably switch to /status/health but the point remains.) Is that because the broker is sending tons of parallel queries to every historical and exhausting the historical’s druid.server.http.numThreads? Or is something else going on?

Note that we have druid.server.http.numThreads=200 on historicals, and druid.broker.http.numConnections=40 on brokers.

–dave

Hi David:

You were right on “it turn into N parallel queries”

You have “druid.broker.http.numConnections=40” on brokers, and assume you have two brokers in the cluster, then each historical node in your cluster, can theoretically get up to 80 connections from the total set of brokers. Yet your historicals can accept 200 connections each, so it leaves enough space for other non-broker-issued requests.

Problem is on why Kubernetes status check failed. Were your historical nodes capable to handle that many connections? How were the CPU and memory load on each pod when you issued the large query?

Hope this helps