A broker gets a query (say, a groupby query) whose interval covers a large number of segments. It sends queries to historicals and merges the results together.
Consider a single historical, which has many (N) segments which the broker needs to query. Does this turn into a single HTTP query from the broker to the historical that covers all of the segments? Or does it turn into N parallel queries?
We’re seeing a situation where a single large query over a wide time interval to a broker can cause many of our historicals to fail to respond on
/status (which we are using as a Kubernetes liveness probe) for long enough that Kubernetes kills it. (We should probably switch to
/status/health but the point remains.) Is that because the broker is sending tons of parallel queries to every historical and exhausting the historical’s druid.server.http.numThreads? Or is something else going on?
Note that we have druid.server.http.numThreads=200 on historicals, and druid.broker.http.numConnections=40 on brokers.