Broker Query timeouts

Hey all,

very simple queries to my broker are taking very long for some reason (10secs - 50secs). Most of them get timed out. I read through some other threads and figured this might be some configuration issue i might have. I currently have segment granularity at 15mins. Coordinator says there are 7K+ segments in s3. would this slow down merging results in broker queries? Querying Historical nodes directly never timeout and are very fast.

Here is my broker runtime props:

druid.host=${INSTANCE_IP}

druid.port=8082

druid.service=broker

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.broker.cache.sizeInBytes=33554432

druid.server.http.numThreads=50

I have 2 cores atm so i left numThreads at default.

Let me know if you need any more info,

Hey Nicholas,

You could try moving the caching to the historicals (set druid.broker.cache.useCache and druid.broker.cache.populateCache true there; then set it false on the broker). This will push the first level of merging work down to the historicals and will help your broker out a bit.

If your individual segments are very small (<100MB) then it would also help to have a smaller number of larger segments. You could do that by reindexing things with a coarser segmentGranularity in batch, or by using fewer partitions in realtime.

If you are collecting Druid metrics, understanding where the bottleneck is would really help. It could be the broker merging segments, or the historical simply aren’t configured or tuned properly and the bottleneck is on the segment scan or on the historical side.

Whats weird is that this was not happening before. It was not until i noticed that today queries were timing out.

I have a different partition number in shardSpec for each realtime node (I have 5 for a given datasource). This is because id like the broker to merge those realtime results (did i get that right?). realtimes arent guaranteed to get same number of data from different kafka partitions.

is this the tip you are talking about Gian?

Fangjin,

I just bumped up heap for my historical nodes and increased segmentCache maxSize. what else could cause historicals to choke up?

@Gian.

Thanks again!! Looks like Broker performance is holding up with the cache change you suggested. I will monitor this and post back if any issues.

I owe you lunch.