groupBy query went OutOfMemoryError after so long time

Hi,

I am trying to run gruopBy query on 1hr data where data is around 1 lakh and it work fine. But for around 33 millions data for one hour segment it fails.I am running groupBy query with 8 dimensions and 62 aggregation metrics in both scenario. I monitor when I try to run the query (for 33 million data), it consumed almost 45GB RAM in so many hours and lastly it gave OutOfMemoryError.

Details of my system configuration: 64 GB RAM, 16 core system, 200 GB Hard disk.

Broker Node configuration :

Runtime.prop

druid.service=druid/broker

druid.port=8082

HTTP server threads

druid.broker.http.numConnections=20

druid.server.http.numThreads=10

Processing threads and buffers

druid.processing.buffer.sizeBytes=524288000

druid.processing.numThreads=12

Query cache

druid.broker.cache.useCache=false

druid.broker.cache.populateCache=false

druid.sql.enable=false

druid.broker.http.readTimeout=PT120M

druid.query.groupBy.defaultStrategy=v2

druid.query.groupBy.maxOnDiskStorage=10000000000

druid.query.groupBy.maxResults=8000000

jvm.prop

-server

-Xms8g

-Xmx8g

-XX:MaxDirectMemorySize=10g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Historical Node Configuration:

Runtime.prop

druid.service=druid/historical

druid.port=8083

druid.server.tier=h158

HTTP server threads

druid.server.http.numThreads=10

Processing threads and buffers

druid.processing.buffer.sizeBytes=524288000

druid.processing.numThreads=12

Segment storage

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:130000000000}]

druid.server.maxSize=130000000000

Query cache

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.cache.type=local

druid.cache.sizeInBytes=20000000000

#druid.historical.cache.unCacheable=[“groupBy”]

druid.query.groupBy.maxOnDiskStorage=10000000000

druid.query.groupBy.maxMergingDictionarySize=8000000

Tested on unCacheable and Cacheable.

jvm.prop

-server

-Xms8g

-Xmx8g

-XX:MaxDirectMemorySize=15g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Please attached logs for historical node. Please let me know how to improve the result of the groupBy query.

Thanks,

Santosh Sahoo

historical.log (293 KB)

Do you have a copy of the OutOfMemoryError that you got?