Druid broker frequency full gc occurred

Hi Druid team,

When we test druid upgrade to 0.7.1.1 from 0.6.169, at last of the query frequency full gc occurred on broker.

We do some query testing on one broker node, from the gc log, we find that the heap size grows gradually until the max heap size,

We have dump the heap, and grep out the heap size changing from gc log. see the attachments

Could you give us some advice on this?

broker runtime.properties:

druid.service=broker

druid.port=8080

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.broker.cache.unCacheable=[“select”]

druid.processing.buffer.sizeBytes=2147483647

druid.processing.numThreads=12

druid.server.http.numThreads=50

druid.broker.http.numConnections=20

druid.broker.http.readTimeout=PT15M

broker jvm configuration:

JAVA_ARGS="-Dfile.encoding=UTF-8 -Duser.timezone=UTC -Djava.io.tmpdir=/data/tmp/druid/ -Ddruid.host=$HOSTNAME -Dlogsuffix=broker -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"

JAVA_ARGS="${JAVA_ARGS} -Xms40g -Xmx40g -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=3000 -XX:ParallelGCThreads=28 -XX:ConcGCThreads=24 -XX:G1MixedGCLiveThresholdPercent=55 -XX:G1ReservePercent=15 -XX:+PrintGC -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=50M -Xloggc:…/log/gc-broker.log"

JMX_OPTS="-Dcom.sun.management.jmxremote.port=17070 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

**query: **

{“queryType”:“groupBy”,“dataSource”:“pulsar_ogmb”,“intervals”:[“2015-04-22T00:00:00-07:00/2015-04-29T00:00:00-07:00”],“granularity”:“all”,“aggregations”:[{“type”:“doubleSum”,“name”:“gmv”,“fieldName”:“gmv_ag”},{“type”:“longSum”,“name”:“vicount”,“fieldName”:“vicount_ag”},{“type”:“longSum”,“name”:“quantity”,“fieldName”:“quantity_ag”},{“type”:“longSum”,“name”:“imprecount”,“fieldName”:“imprecount_ag”},{“type”:“longSum”,“name”:“clickcount”,“fieldName”:“clickcount_ag”}],“filter”:{“type”:“selector”,“dimension”:“site”,“value”:“0”},“postAggregations”:[{“type”:“arithmetic”,“name”:“ctr”,“fn”:"/",“fields”:[{“type”:“fieldAccess”,“name”:“clickcount”,“fieldName”:“clickcount”},{“type”:“fieldAccess”,“name”:“imprecount”,“fieldName”:“imprecount”}]}],“dimensions”:[“trafficSource”,“city”],“limitSpec”:{“type”:“default”,“limit”:300,“columns”:[{“dimension”:“gmv”,“direction”:“descending”}]},“having”:{“type”:“greaterThan”,“aggregation”:“gmv”,“value”:“0”},“context”: { “useCache”: “true”}}

dump-0.txt (283 KB)

gc.dump (66.1 KB)

I think Himanshu mentioned this in the other post, but I will reiterate.

I think you can disable caching on the broker and instead turn on caching on the historical nodes. This enables historical nodes to merge results from segments locally and puts less strain on the broker. I would disable groupBy caching as large result sets can eat up cache space very fast. I’d also be interested in seeing your common configs. It is interesting that you see these problems wtih 0.7.1.1 as we did not change much in terms of query performance or resource usage. I wonder if configuration changes occurred from one version to the other.