my druid setup is currently slow on queries that do not have filter conditions set and druid needs to aggregate a metric over all rows. Queries that have a filter set on a dimension with cardinality > 50 run very quickly, so the filtering works quickly on large datasets.
Executing a single topn query on 20 billion rows over four metrics takes at least 12-15 seconds and goes up to 70 seconds under light load on a cluster with 27 r3.8xlarge nodes with all data memory mapped.
I got a couple of questions:
In the production setup guide (http://druid.io/docs/latest/configuration/production-cluster.html) it says “Historical daemons should have a heap size of at least 1GB per core for normal usage” but then further down, the historical config for a r3.8xlarge node is only given 12GB heap although the config pertains to an instance type with 32 cores.
the production setup guide recommends druid.broker.http.numConnections=20 for the broker whereas the benchmark settings here (https://github.com/druid-io/druid-benchmark/blob/master/config/broker-runtime.properties) has this set to 300.
Why are these values so different? What are the implications of setting it so low or so high?
the benchmark blog (http://druid.io/blog/2014/03/17/benchmarking-druid.html) mentions “Interval chunking is turned on by default to prevent long interval queries from taking up all compute resources at once.”
when I turn chunking on, even so that only few chunks (2-10) are created, the broker query response times are twice as slow as without chunking. Furthermore there is this open ticket which reports that chunking produces incorrect numbers (https://github.com/druid-io/druid/issues/2262)
Is it safe to use chunking in Druid 0.9.1.1?
documentation on chunkPeriod (http://druid.io/docs/latest/querying/query-context.html) states “Make sure “druid.processing.numThreads” is configured appropriately on the broker”. What would be an appropriately configured setting when using chunkPeriod vs not using it?
I read several druid blogs in which people talk about tunings like -XX:+UseNUMA, switching off biased locking (-XX:-UseBiasedLocking), zone_reclaim_mode etc. We use standard r3.8xlarge nodes on trusty ubuntu. So far, I coudln’t observe any performance improvements while playing around with such settings. Do they apply for r3.8xlarge instances at all? Are there any OS level tunings recommended on r3.8xlarge nodes?
Any tips on any of those subjects or other suggestions are highly appreciated. I tested out pretty much every setting that Druid has but never experienced any performance improvement so far.