[druid-user] Timeout debug

Hey Federico,

Some things to check are,

  • Verify that your disk i/o is under control; you might be swapping a lot and not getting good memory caching of data. You have 11 threads with 1G processing buffers each, plus what you have your java heaps set to, which probably doesn’t leave much memory left over for data (it’s cached out-of-process by the OS page cache).

  • You have some big aggregators that you aren’t sorting on. thetaSketch in particular can be memory intensive. Sometimes in situations like this it will help to set “doAggregateTopNMetricFirst” : true in your query context.

Hi Gian, thanks a lot for the quick answer.

  • I checked disk i/o and seems OK, average disk usage when loading segments is about 20%, tops 30% but then drops to 0% when processing segments, there’s is no sign of bottle neck here.

  • Yes switching off thetaSketch seems to make things faster as normal queries don’t timeout.

Here is an image of the system with theSketch on (glances command)

CPU 89.1% Load 12-core Mem 45.4% active: 13.8G Swap 0.0%

user: 88.9% nice: 0.0% 1 min: 3.74 total: 31.3G inactive: 11.6G total: 7.63G

system: 0.2% iowait: 0.0% 5 min: 2.18 used: 14.2G buffers: 19.3M used: 0

idle: 10.9% irq: 0.0% 15 min: 1.55 free: 17.1G cached: 11.7G free: 7.63G

VIRT RES CPU% MEM% PID USER NI S TIME+ IOR/s IOW/s NAME

37.4G 23.1G 1060. 74.0 964 ubuntu 0 S 41:39.45 0 5K java

77M 15M 2.6 0.0 3276 ubuntu 0 R 0:09.49 0 0 /usr/bin/python /usr/bin/glances

0 0 1.6 0.0 128 root 0 S 0:04.32 0 0 kworker/0:1

0 0 1.6 0.0 129 root 0 S 0:03.51 0 0 kworker/6:1

0 0 0 1.6 0.0 153 root 0 S 0:02.78 0 0 kworker/9:1

0 0 0 1.3 0.0 102 root 0 S 0:03.90 0 0 kworker/2:1

0 0 0 1.3 0.0 125 root 0 S 0:03.30 0 0 kworker/8:1

0 0 0 1.3 0.0 127 root 0 S 0:02.96 0 0 kworker/5:1

0 0 0 1.3 0.0 145 root 0 S 0:02.78 0 0 kworker/11:1

0 0 1.3 0.0 167 root 0 S 0:02.83 0 0 kworker/10:1

WARNING|CRITICAL logs (lasts 3 entries)

~ 2016-08-11 15:41:01 > ___________________ CPU user (87.3/88.2/89.0) - Top process: java

2016-08-11 15:40:06 > 2016-08-11 15:40:43 CPU user (75.7/87.4/89.1)

And here without thetaSketches:

CPU 88.6% Load 12-core Mem 47.6% active: 14.5G Swap 0.0%

user: 87.6% nice: 0.0% 1 min: 1.85 total: 31.3G inactive: 12.3G total: 7.63G

system: 0.6% iowait: 0.4% 5 min: 0.94 used: 14.9G buffers: 20.3M used: 0

idle: 11.4% irq: 0.0% 15 min: 1.21 free: 16.4G cached: 12.4G free: 7.63G

VIRT RES CPU% MEM% PID USER NI S TIME+ IOR/s IOW/s NAME

37.4G 23.6G 908.0 75.3 964 ubuntu 0 S 48:30.30 29M 1M java

6.8G 1.2G 133.8 3.9 965 ubuntu 0 S 1:18.30 0 17K java

77M 15M 2.9 0.0 3440 ubuntu 0 R 0:06.13 0 0 /usr/bin/python /usr/bin/glances

0 0 1.6 0.0 128 root 0 S 0:05.24 0 0 kworker/0:1

0 0 1.6 0.0 129 root 0 S 0:04.39 0 0 kworker/6:1

0 0 1.6 0.0 145 root 0 S 0:03.45 0 0 kworker/11:1

0 0 1.6 0.0 167 root 0 S 0:03.49 0 0 kworker/10:1

0 0 1.6 0.0 212 root 0 S 0:03.60 0 0 kworker/3:1

0 0 1.3 0.0 102 root 0 S 0:03.86 0 0 kworker/2:1

0 0 1.3 0.0 125 root 0 S 0:03.75 0 0 kworker/8:1

0 0 1.3 0.0 127 root 0 S 0:03.65 0 0 kworker/5:1

0 0 1.3 0.0 148 root 0 S 0:03.54 0 0 kworker/7:1

0 0 1.3 0.0 153 root 0 S 0:03.48 0 0 kworker/9:1

0 0 1.3 0.0 211 root 0 S 0:03.76 0 0 kworker/1:1

WARNING|CRITICAL logs (lasts 2 entries)

2016-08-11 15:51:29 > 2016-08-11 15:51:53 CPU user (81.5/85.5/88.3)

2016-08-11 15:51:07 > 2016-08-11 15:51:10 CPU user (83.9/83.9/83.9)

On this examples, seems like CPU is always at top, but RAM is not fully used (it seems there is more than 50% inactive). If using thetaSketch is ram-limit-bound, how can I detect that? Which are the proper logs to look at for debugging this?

Thanks again for your help and patience :slight_smile:

Did you try setting “doAggregateTopNMetricFirst” : true in the context and seeing if that helps?

Not yet! Is it possible to put that query context in pivot somehow? Sorry for asking, I’m new to everything related to druid and druid itself.

Also I was trying to find out how to “debug” a problem or find a bottle-neck inside the cluster given any scenario.

Thanks!

Hey Federico,

It’s not possible to add context flags to Pivot now, but I have it on good authority that the next version will make this possible.

In general for finding bottlenecks in query performance the things to look at are:

  • Disk: are you swapping too much?

  • GC: are you getting a lot of JVM GC time?

  • CPU: are they maxed out?

Usually if Disk and GC look fine, but CPU is maxed (like in your situation) then the thing to look at is limiting work the CPU has to do. This could be limiting the dimensions/metrics in your query, optimizing the query in general, that kind of thing. You can profile Druid to see where it’s spending its time.

In your particular case, I bet a lot of the time is spent aggregating the sketches. What doAggregateTopNMetricFirst does is defer non-sorting metric aggregation until a second pass. In your case, where the sort metric (count) is cheap and the other metrics are expensive (sketches) it can help a bit. Note that you may not see any effect, which means Druid likely triggered it automatically. It does trigger this algorithm switch on its own sometimes.

Btw, note that doAggregateTopNMetricFirst can slow down queries that don’t fit that model (which is why it’s not always on). So when you do test this out, keep that in mind.

Hey Federico,

Based on your comment I updated this PR into Pivot https://github.com/implydata/pivot/pull/335/files#diff-0a02c76fe67fcff3f80f386b8cb86d1dR141

When it is merged (should be tomorrow morning) you will be able to set a custom context as Gian suggested (once you upgrade to the latest version).

I’ll post again tomorrow when the Pivot is published.

Best regards,

Vadim

Guys, thanks a lot for all your help I really appreciate it :).

@Vadim great! I will be looking into this and try as soon as it’s ready.

@Gian those three insights are really interesting and I will take them into account. Disks seems to be OK. I found some GC issues (but not on every query that times-out). How can I tune GC or find some information about best practices? Also you said “You can profile Druid to see where it’s spending its time.”, how can I do that ?

Today I will make some tunnings and share the results afterwards.

Thanks again guys :slight_smile:

Hey Federico,

For GC tuning, if you are getting a lot of time spent in GC (more than a few %) then the place to start is just making your heap bigger. If that doesn’t help then you can start tuning fiddly things like which collector you’re using and the specific parameters for that collector. For profiling, the “poor mans profiling” is simply running “jstack -l” a few times and seeing what’s usually at the top of the stacks. Rich mans profiling is using something like YourKit.

Beyond that, there’s a lot of info online about JVM GC tuning and JVM profiling and most of it would apply to Druid.

As off Pivot 0.9.36 (just released) you can add a druidContext flag to the options of a data cube so you can play around with sending doAggregateTopNMetricFirst and random movie quotes into Druid.

Auto Generated Inline Image 1.png

Hi guys!

@Vadim That was fast! Thank you :slight_smile: I was able to download and successfully run the new config option for pivot. In verbose mode I was able to see that the query sent to Druid had the “doAggregate…” flag on true, so it worked as expected! When generating the config some comments “#” were missing on a couple of lines, I had to manually check and add them. Though was worth the comment because some users could be trying to generate the config file and get error while running pivot.

@Gian First I tried to use “jstat –gc 1000” command but since I’m really new to Java and even more new to Java GC, I had some troubles trying to understand all those metrics. But I managed to use jvmtop command, which showed a nice and simple % of GC usage (I hope this was a fine metric to look at). It never went up of 1%. I had a problem with one ETL script that made for testing purposes and that generated the GC problems. Putting the heat on 8GB made the script run smoothly and GC problems don’t happen anymore on that script.

As you said I also tried to use doAggregateTopNMetricFirst flag to see if it improved things but sadly it didn’t (or at least nothing noticeable). The best thing for now is to remove thetaSketches. HyperLogLog works for us and also seems to work faster.

I will keep digging into druid/java and making some changes in the configs to improve performance. After I understand the system a little bit more I think I’m going to add a few nodes and see how everything goes, keeping in mind your advice.

Thanks guys!!