Why and Where Druid Historical/Brokers Eats Up HEAP

Hello Druid,
We are facing heavy GC Failures in our Historical Nodes on our Production Cluster.

The historical machines are getting GC (Full GC) every hour and pauses are also huge.

We are observing that Cooridinator is continously moving Segments from one Historical to another because of long pauses on Historical in case of Full GC.

Following are out configs :

-server

-Xmx30g

-Xms30g

-XX:NewSize=7g

-XX:MaxNewSize=7g

-XX:MaxDirectMemorySize=100g

-XX:+UseG1GC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-XX:+HeapDumpOnOutOfMemoryError

-XX:HeapDumpPath=/usr/local/esp/druid/druid-tmp/historical.hprof

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

-Djava.io.tmpdir=/usr/local/esp/druid-tmp

-DprocessName=historical

-Daws.region=us-west-2

We are using 96 Cores and 370 GB RAM AWS EC2.

We want to understand why so much of heap is being eaten by Historical.

druid.processing.numMergeBuffers=2

druid.processing.buffer.sizeBytes=1073741824

druid.server.http.numThreads=50

druid.server.maxSize=800000000000

Please help here.

Thanks,

Pravesh Gupta

One More thing :
We are using Druid 0.13.0 Version .

Hi Pravesh,

You should upgrade to latest druid version 0.16 which has got memory optimization fixes and many other bug fixes.

Now on your issue:

The coordinator could be moving the segments to rebalance the segments across the historicals.

Also, I see you have set Historical heap to 30 Gigs which seems on higher side :

A general rule-of-thumb for sizing the Historical heap is (0.5GB * number of CPU cores), with an upper limit of ~24GB.
Having a heap that is too large can result in excessively long GC collection pauses, the ~24GB upper limit is imposed to avoid this.

Since when you are facing this issue? From the time your historicals were a stable :

  1. Is there any increase in Ingestion data volume?

  2. Increase in the number of queries?

  3. How many data sources you have what is the total segment size loaded by historical

  4. You can also check om what is the average segment size ( ideally it should be in the range of 400MB-700MB), the number of small segments may cause memory pressure sometimes.

  5. How many Historical nodes you have?

Thanks and Regards,

Vaibhav

Thanks Vaibhav.
I dont understand why there is upper limit on Heap for the first place. If I am using G1 GC Garbage Collector and I am using larger Heap , My pause time wont be that long right ?

Anyways :

  1. Is there any increase in Ingestion data volume?

--------- Yeah there is.

  1. Increase in the number of queries?

-------- Not that much
3) How many data sources you have what is the total segment size loaded by historical

---------- Total 2 Data sources.

---------- Total 3.1 TB Of data being loaded by Historical as I can see on coordinator console.

  1. You can also check om what is the average segment size ( ideally it should be in the range of 400MB - 700MB), the number of small segments may cause memory pressure sometimes.

---------- I have 5700 Segments (Day Granularity) and average size is around 100 MB . We are using Dimension Based partitioning and hence we are having segment of comparatively smaller size. Previously we used to have Hash Based partitioning with 700 MB Segment Size. With Hash Map based partitioning our Queries used to perform very slow.

  1. How many Historical nodes you have?

----------- We have total of 20 Historical Nodes , However Weird thing is that in Coordinator I can only see 17 Nodes at this time.

Anyone here to answer this ???

Can you share your gc logs? You will have to specify a -Xloggc: arg in the jvm confing file. The fact that zk is not seeing three of your nodes is weird and probably concerning. Couple of observations -

  1. Your mention of 96 cores and 370gigs of memory is across your 20 nodes right? Can you share your runtime.properties?
  2. You are specifying a druid.server.maxSize of 800 gigs. It is way over what total memory you have, let alone free memory(depending on your answer to #1. if each node is a 96 core, then sure you have at least 6tb of total memory in your cluster). You should figure out the heap + direct memory collectively across all notes and then calculate the free memory across your cluster.

The GC logs might be a good place to start.

I dont have GC logs with me in concrete format. We have to start our HIstoricals with the options you suggested , since its production enviroment , its taking bit time to get permissions to restart it.
Following is the runtime proprties :

druid.monitoring.monitors=[“org.apache.druid.server.metrics.HistoricalMetricsMonitor”,“org.apache.druid.java.util.metrics.JvmMonitor”,“org.apache.druid.client.cache.CacheMonitor”,“org.apache.druid.java.util.metrics.SysMonitor”]

druid.processing.numMergeBuffers=2

druid.processing.buffer.sizeBytes=1073741824

druid.server.http.numThreads=50

druid.server.maxSize=800000000000

druid.segmentCache.locations=[{“path”: “/usr/local/esp/druid-tmp/druid/segment-cache”, “maxSize”: 800000000000}]

Our concern is figuring out what eats up in historical HEAP. Important point to note that is that We do not have ANY GROUPBY queries in our service, Only TopN and Timeseries.

Lookups will use heap too. From the documentation -

If you are using lookups, calculate the total size of the lookup maps being loaded.

Druid performs an atomic swap when updating lookup maps (both the old map and the new map will exist in heap during the swap), so the maximum potential heap usage from lookup maps will be (2 * total size of all loaded lookups).

Be sure to add (2 * total size of all loaded lookups) to your heap size in addition to the (0.5GB * number of CPU cores) guideline.

What type of EC2 instances are you using btw? Just curious.