Realtime Nodes Running Out of Memory

Hey,

I have a cluster of realtime nodes each on their own ec2 instances with about 16GBs of mem each. I am allocating 13GBs to JVM heap.

Over time, usable memory decreases but never increases until it gets to about 1-2% of usable memory. then the node hangs around, allocating and deallocating until finally it runs out.

Is this normal behavior for realtime nodes to use memory like this? At the moment, it takes a little over a week for a node to run out.

This is runtime.props it that helps:

druid.host=${INSTANCE_IP}

druid.port=8084

druid.service=realtime

druid.processing.buffer.sizeBytes=1073741824

druid.server.http.numThreads=50

druid.realtime.specFile=/usr/local/druid/config/realtime/schema.spec

# Enable Real monitoring

druid.monitoring.monitors=[“io.druid.segment.realtime.RealtimeMetricsMonitor”]

druid.emitter=logging

druid.emitter.logging.logLevel=info

druid.indexer.storage.type=metadata

What version of Druid?
Are you sure handoff is working?

Any exceptions in logs?

Definitely not normal behavior.

Im running Druid 0.8.0.

There are no exceptions in logs and handoff is working properly (I’m watching segment folders disappear from intermediate persist direc after window period).

I saw you mention in a previous thread that emitted metrics can help discover where most issue are: http://druid.io/docs/latest/operations/metrics.html.

at the top of the doc it says metrics can be sent over HTTP to another service like kafka. is this something that is out of the box with druid? or do i have to direct those logs myself?

Nicholas, try to reproduce the issue with 0.8.3. Druid can emit metrics over HTTP. You’d have to set up the Kafka ingestion and data pipeline yourself