java.lang.OutOfMemoryError: GC overhead limit exceeded on historical nodes

Hi,

I have been experiencing OOMEs constantly on historical nodes:

2017-Nov-02 07:02:55 AM [processing-5] ERROR com.google.common.util.concurrent.Futures$CombinedFuture - input future failed.

java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.nio.DirectByteBufferR.duplicate(DirectByteBufferR.java:217) ~[?:1.8.0_73]

at java.nio.DirectByteBufferR.asReadOnlyBuffer(DirectByteBufferR.java:234) ~[?:1.8.0_73]

at io.druid.query.aggregation.hyperloglog.HyperUniquesSerde$3.fromByteBuffer(HyperUniquesSerde.java:123) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.aggregation.hyperloglog.HyperUniquesSerde$3.fromByteBuffer(HyperUniquesSerde.java:113) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.data.GenericIndexed$BufferIndexed._get(GenericIndexed.java:537) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.data.GenericIndexed$2.get(GenericIndexed.java:158) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.data.GenericIndexed.get(GenericIndexed.java:395) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.column.IndexedComplexColumn.getRowValue(IndexedComplexColumn.java:53) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.segment.QueryableIndexStorageAdapter$CursorSequenceBuilder$1$1QueryableIndexBaseCursor$8.get(QueryableIndexStorageAdapter.java:883) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.select.SelectQueryEngine.singleEvent(SelectQueryEngine.java:297) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.select.SelectQueryEngine$1.apply(SelectQueryEngine.java:252) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.select.SelectQueryEngine$1.apply(SelectQueryEngine.java:215) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:68) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.query.QueryRunnerHelper$1.apply(QueryRunnerHelper.java:63) ~[druid-processing-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:42) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:43) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:42) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:46) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.FilteredSequence.accumulate(FilteredSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.LazySequence.accumulate(LazySequence.java:40) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[java-util-0.10.0.jar:0.10.0]

at io.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

And sometimes I got exception like:

2017-Nov-02 10:05:19 AM [qtp1829194516-44] ERROR com.sun.jersey.spi.container.ContainerResponse - The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container

java.lang.OutOfMemoryError: GC overhead limit exceeded

The server spec is 32GB memory and 8 CPUs and 480GB SSD.

The JVM and runtime configs are as follows:

JVM:

-Xms6g

-Xmx6g

-XX:MaxDirectMemorySize=15g

runtime:

druid.service=druid/historical

druid.port=8083

HTTP server threads

druid.server.http.numThreads=25

Processing threads and buffers

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=7

Segment storage

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:260000000000}]

druid.server.maxSize=250000000000

Query cache

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.broker.cache.unCacheable=

GroupBy

druid.query.groupBy.maxMergingDictionarySize=8000000000

druid.query.groupBy.maxOnDiskStorage=32000000000

druid.query.groupBy.maxIntermediateRows=2000000000

druid.query.groupBy.maxResults=2000000000

I think the memory on the node is enough.

I did some rough calculation: 7 * 1073741824 + (15 + 6) * 1024 * 1024 * 1024 < 32 * 1024 * 1024 * 1024

Also, I checked with wc -l /proc/5353/maps, which shows only 1700+.

I think we didn’t reach to point where we need to adjust /proc/sys/vm/max_map_count

So I wonder what would be the cause of this OOME.

Thanks in advance!

Hey Kong,

In your stack trace I see a “select” query. This query type is known to have some propensity to use excessive memory and so in the upcoming Druid 0.11.0 we have moved the “scan” query extension into core. It is more efficient with regard to memory.

In the current version it’s a community contributed extension: http://druid.io/docs/0.10.1/development/extensions-contrib/scan-query.html

And in 0.11.0 it’s in core: http://druid.io/docs/0.11.0-rc1/querying/scan-query.html (this is a link to the 0.11.0-rc1 docs, since 0.11.0 hasn’t been released yet)

So perhaps give that a try.

Hi Gian,

Thanks for your prompt response.

I wonder what makes a query as a “select” query. Based on what you said, I think we should actually avoid using “select” query, since it will be replaced by “scan”.

Is there some document that I can refer to so that I can learn more about it?

Our query looks like this:

{

“queryType”: “groupBy”,

“dataSource”: “dataSource”,

“intervals”: [

“2017-09-30T15:00:00.000Z/2017-10-01T15:00:00.000Z”

],

“dimensions”: [

“dim1”

],

“aggregations”: [

{

“type”: “doubleSum”,

“name”: “metric1”,

“fieldName”: “metric1”

}

],

“granularity”: “all”,

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “dim2”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim3”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim4”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim5”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim6”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim7”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim8”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim9”,

“value”: “—”

},

{

“type”: “selector”,

“dimension”: “dim10”,

“value”: “—”

},

{

“type”: “not”,

“field”: {

“type”: “selector”,

“dimension”: “dim1”,

“value”: “—”

}

}

]

}

}