Realtime Nodes Memory keeps increasing

Hey,

I have realtime nodes running on dedicated ec2 instances inside docker containers and am using datadog to monitor. datadog shows that memory keeps increasing on realtime nodes and never recovers. sometimes it plateaus but it never goes down. I noticed that in my basePersitsDirectory folders from persitsed segments are not getting deleted (there are segments folders from 6 days back). I see logs on realtime nodes that it is removing index.zip so hand off looks like its performing normally and data is in s3. could those lingering folders be an issue with memory usage? are realtime nodes even using these folders still after handoff?

Im ingesting with kafka so I have enough realtimes to balance each partition in my topic. Each realtime alos has 13GB in heap.

What else could be a cause for memory?

Thanks,

Hey Nicholas,

When you say “memory is increasing” are you referring to the amount of the JVM heap that is used? Or the amount of used memory as reported by the OS? If it’s the latter, it’s actually pretty normal for Linux (used memory slowly growing until it’s almost all your memory, and staying there, as it’s being used for page cache).

If you suspect a problem with handoff of specific segments, you could check the items under “My realtime node is not handing segments off” here: http://druid.io/docs/latest/ingestion/faq.html

Gian!

So I checked out my coordinator node and it was throwing warnings about not being able to load segments on historical nodes. This guy here:

2015-11-06T06:27:58,359 WARN [Coordinator-Exec–0] io.druid.server.coordinator.rules.LoadRule - Not enough [_default_tier] servers or node capacity to assign segment[wpUserInteraction_2015-11-05T13:00:00.000Z_2015-11-05T13:15:00.000Z_2015-11-05T13:00:00.000Z_2572]! Expected Replicants[2]

I checked my rules and I am loading everything. Forever. I changed that to load everything for past month and drop everything else. I checked coordinator console and it looks like segments are being dropped/loaded and I dont see this warning anymore. Realtime nodes have a lot segment folders so its hard to tell now if they’re deleting. Memory is also still going down. I will wait a while until coordinator finishes.

However, I do have one question about dropping segments: if coordinator drops a segment for a rule config, would i still be able to query for that segment? I know its not deleted from s3 but is it removed from memory or completely removed from druid cluster?

Thanks again,

You can’t query segments when they’re dropped; they remain in deep storage, but they’re removed from the historicals and the historicals don’t pull down segments on demand. If you do end up wanting to query them again, you could re-load the segments from deep storage.

Hmmm,

So everything loaded on historical nodes are kept in the segmentCache location right? does Historical ever move some of this data to disk?

I’m trying to find out how i can load everything on historical nodes without them blowing up because segmentCache is not large enough.

What I’m trying to get at is if the segmentCache actually behaves like a cache. I’m noticing that this location fills up entirely and only really moves data when I have a certain coordinator rule in place. but once data is dropped from this location i cant query it ever until i specify to reload it again. which wont happen on the fly.

how do you query for something past the rule date if that segment cache is limited? can druid automatically reload these segments for querying?

Hello,

Segments dropped because of drop rule are not deleted from deep storage and they are not queryable until loaded by a historical again. If you want to completely delete a segment from deep storage see Kill Task here - http://druid.io/docs/latest/misc/tasks.html

Just saw the new messages…I guess your query has already been answered. For your last question - you cannot query segments which are not loaded by a historical node. You have to create rules such that you have the required segments in Druid cluster. I have seen projects using a different historical tier for keeping old data which they don’t expect to be queried frequently. Generally historicals in this tier are less powerful machines in terms of memory and probably can have spinning disks instead of SDD, another optimization is to increase the query granularity for data older than a particular time to decrease the segment size. See historical tier configuration here - http://druid.io/docs/latest/configuration/historical.html

so yeah. I think ill have to go with the former suggestion you made. I have just tiered my historicals and that might be the best route to go
since they wont be queried as often they wont need to be as beefy as far as memory.

Thanks for your responses guys.

Hey Nicholas,

No, the historical segmentCache is not populated on demand. It’s only populated when the coordinator tells a historical node in advance to load a segment. That does mean that you need to have enough disk space on your historicals to download all segments that you want to query, in advance.