[druid-user] Druid Historical Nodes container's memory gets filled up

Hi

Our Druid setup consists of 2 brokers and 3 historicals, our historical and broker nodes heap memory gets filled up after running for a week, these processes are hosted as docker containers. I am posting the druid historical configuration below for your reference.

DRUID_XMX=60g
DRUID_XMS=60g
DRUID_MAXNEWSIZE=8g
DRUID_NEWSIZE=8g
DRUID_JVM_ARGS=-server -XX:+UseG1GC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/data
DRUID_MAXDIRECTMEMORYSIZE=60g

druid_emitter_logging_logLevel=error

druid_extensions_loadList=[“druid-histogram”, “druid-datasketches”, “druid-lookups-cached-global”,“druid-azure-extensions”,“sqlserver-metadata-storage”,“druid-kafka-indexing-service”, “statsd-emitter”]

#Metadata - SQL
#druid_metadata_storage_host=
druid_metadata_storage_type=sqlserver
druid_metadata_storage_connector_connectURI=""
druid_metadata_storage_connector_user="${USERROLE}"
druid_metadata_storage_connector_password="${PASS}"

druid_coordinator_balancer_strategy=cachingCost

druid_indexer_runner_javaOptsArray=["-server", “-Xmx8g”, “-Xms8g”, “-XX:MaxDirectMemorySize=8g”, “-XX:+UseG1GC”, “-XX:MaxGCPauseMillis=100”, “-Duser.timezone=UTC”, “-Dfile.encoding=UTF-8”, “-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager”]
druid_server_http_numThreads=48
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=336870912
druid_indexer_fork_property_druid_processing_numThreads=2
druid_indexer_fork_property_druid_server_http_numThreads=45
druid_processing_buffer_sizeBytes=1000000000
druid_query_groupBy_maxOnDiskStorage=10000000000
druid_processing_numMergeBuffers=4
druid_processing_numThreads=10
druid_processing_columnCache_sizeBytes=2000000000

#Deep Storage - Blob
druid_storage_type=azure
druid_azure_account=""
druid_azure_key="${STORAGEPASS}"
druid_azure_container=druidsegments
druid_azure_protocol=http
druid_azure_maxTries=5

druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/data/indexing-logs

DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?>

#Query Caching
druid_historical_cache_useCache=true
druid_historical_cache_populateCache=true
druid_cache_sizeInBytes=2000000000
druid_cache_expireAfter=360000

By default druid emission period is 1 minute (PT1M)

We recommend using 15 seconds instead:

#druid_monitoring_emissionPeriod=PT15S

Use ‘statsd-emitter’ extension as metric emitter

#druid_emitter=logging

Configure ‘statsd-emitter’ endpoint

#druid_emitter_statsd_hostname=localhost
#druid_emitter_statsd_port=8125

Configure ‘statsd-emitter’ to use dogstatsd format Must be set to true, otherwise tags are not reported correctly to Datadog

#druid_emitter_statsd_dogstatsd=true
#druid_emitter_statsd_dogstatsdServiceAsTag=true

Hmmm… my immediate thought is to ask you to review what goes into the heap on each process; that may give you some indication.