Historical node exceptions because of too much data

Hi!

We now having 5 historical nodes, each one on an isolated linux machine. After several month, with the growth of data, we face same problem:

It throws errors:

2015-05-18T17:17:02,819 [ERROR] [processing-8] io.druid.query.GroupByParallelQueryRunner - Exception with one of the sequences!

com.metamx.common.ISE: Maximum number of rows reached

    at io.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:121) ~[druid-services-0.7.0-selfconta

ined.jar:0.7.0]

    at io.druid.query.groupby.GroupByQueryHelper$3.accumulate(GroupByQueryHelper.java:104) ~[druid-services-0.7.0-selfconta

ined.jar:0.7.0]

    at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32) ~[druid-services-0.7.0-selfc

ontained.jar:0.7.0]

and some even die with jvm error:

12184.624: [Full GC12184.624: [CMS: 2917849K->2917849K(6291456K), 7.3134110 secs] 3062609K->2917849K(11953792K), [CMS Perm : 51888K->51888K(83968K)], 7.3135430 secs] [Times: user=7.26 sys=0.04, real=7.31 secs]

12192.162: [Full GC12192.162: [CMS: 2917849K->2917849K(6291456K), 7.3080310 secs] 3063140K->2917849K(11953792K), [CMS Perm : 51888K->51888K(83968K)], 7.3081710 secs] [Times: user=7.27 sys=0.03, real=7.31 secs]

12205.075: [Full GC12205.075: [CMS: 2917849K->2918425K(6291456K), 8.9411010 secs] 3042908K->2918425K(11953792K), [CMS Perm : 51888K->51888K(83968K)], 8.9412470 secs] [Times: user=8.88 sys=0.04, real=8.94 secs]

12214.304: [Full GC12214.304: [CMS: 2918425K->2917919K(6291456K), 7.5355080 secs] 3063289K->2917919K(11953792K), [CMS Perm : 51888K->51888K(83968K)], 7.5356600 secs] [Times: user=7.47 sys=0.05, real=7.53 secs]

Java HotSpot™ 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f7ac9990000, 65536, 1) failed; error=‘Cannot allocate memory’ (errno=12)

some stat numbers of our data:
2.46 TB total size segments, with increasing more than 4G on day

9 datasources, each contain about 20 dimensions, 50 metrics, over 2 years and 3 months

dims are in Long type (cast to string in Druid).

Currently we have 5 machines(Cores: 24, Memory: 189, Disk: 4T not SSD)

Can you give us a recommendation about the Historical Nodes’ hardware setup? Thanks again!

Hi Wan, how many rows of data do you want for your output? Where is that output going? Druid restricts the number of output rows for groupBy because Druid tries to enforce small results sets that can be visualized. Druid is meant to be used for fast, interactive queries, and not when your output set is close to the size of your input set. I’d be very curious to understand your use cases better before giving more advice.

Hi FJ, most of query output we need are less than 100 rows, they are used to power statistic tables with paging support. But sometimes we did need to export large data rows, e.g. thousands of website’s all kinds of metrics along 5 month.

And the statistic tables are directly used by our customers so it can only wait for seconds. Formerly we use Infobright ICE, but it comes challenge when data growth bigger and bigger.

Hi Wan, typically for these use cases, we use lexicographic topNs as you can paginate through results. If you do need to support groupBys returning very large result sets, you can set the result limits higher. I would recommend deprioritizing these types of extremely expensive queries so they do not hog the resources of the cluster.