Historical cpu performance drops as increasing processing threads


I have done a few tests with tuning processing threads and buffer sizes with historical node. The results as follows

The CPU performance decreases as I increase the threads, but I excepted the vice versa. Also the CPU credits are not utilized much.

Please help me to understand what’s wrong here and also would like to have suggestion around how can to utilize full CPU


Shilpa S

We are looking improving the performance of historicals for by tuning the right config.

Under large queries druid historicals take some time, but the CPU goes only 50-60% utilization. We looked at other limiting factor like disk reads but we made sure we have enough ram to be able to suffice for all data into it.
We also cross verified by making sure disk reads are low.

Then we looked into increasing druid.processing.numThreads to get higher CPU utilization and faster response. But what we found is that as we increased the threads the performance infact decreased, both interms of time taken for query to complete and CPU utilization dropped further.

Looking for some help around best way to configure threads to be able to utilize max CPU.
We are using 30*T3 AWS ec2 instances.

have you done these tests with all the performance options off? for example following in the query context:

“useCache”: false,
“populateCache”: false,
“useApproximateTopN”: false,
“useApproximateCountDistinct”: false

Thank you for your help.

We did try with useCache false. We will try with additional flags that you mentioned. Anything else that we should consider while testing performance ?

IMO, I don’t think this could be a workaround. For example, setting useCache to false, queries with large time intervals will cause high number of pending segments to be scanned on Historical . But the server’s IO utilization is always low (about 10% when paging segment cache from the disk). The CPU usage is also low when processing heavily computing queries (like complex groupBy). We’ve tired tuning configs for many times but doesn’t find an approach to full utilized Historucal server resources under high requesting rate.

Gaurav Bhatnagar <gaurav.bhatnagar@imply.io>于2020年5月23日 周六12:00写道:

Thanks Tim.
I didn’t follow. If segments are page faulting then Shouldn’t the disk IO go up ?

If neither Disk nor CPU is going up then what is historical server doing then ? May be that will help us better understand the bottleneck

If segments are page faulting then Shouldn’t the disk IO go up ?
The disk read throughput does rise to about 5m/s on an IOPS 15000 SSD, which seems not promised. Maybe this is caused by the native IO streaming is performed on a single thread?.
If neither Disk nor CPU is going up then what is historical server doing then ?
We’re still figuring it out. For topN queries that scans large number of segment partitions, increasing numThreads might improve the performance a little and increasing the heapmem of historical always helps. But it seems either of them can lead to higher IO utilization.

Gaurav Shah <gaurav@poshmark.com> 于2020年5月23日周六 下午1:21写道: