Druid segment loss (how to tell/track)

As I am POCing a druid cluster there are couple of things the documentation is lacking in clarity or that I’m not understanding. Hopefully someone help me fill in the gaps.

Segment loss on ingestion. I believe this has happened to me because my target segment sizes were probably too large. In any case, my peon tasks OOMed and the segment I was tracking kind disappeared from the segment table. I don’t believe they were spread out to other segments because the segments were large in terms of row count. I could be mistaken but I’m not sure how to get insight into it. Is there a way to see this and how does druid handle situations like this where segments still in memory and haven’t been flushed to disk crash for whatever reason.

In terms of Peon tasks, do they run in the JVM of middle managers, historical or their own? I’m not too sure how to scale up my Peons if I wanted to avoid the OOM i encountered in my ingestion tasks.

During the OOM, I was running very large groupby queries which could also be a reason for the OOM. In terms of executing queries, could running large queries affect ingestion related tasks like peons? I didn’t think so but the timing of it seemed to indicate there might have been some correlation.

Thanks

Hey - glad to hear you’re giving Druid a try :smiley:

Individual ingestion subtasks generate their own logs - you can see those from within the console or on the host machine. That’s always the best place to go to see exactly what happened. The short answer is yes - Druid will handle that kind of situation in streaming ingestion as it tracks the topic offset that was last successfully ingested - so if you have to restart ingestion / you lose boxes etc. Druid will try to pick up where it left off.

It’s worth looking at this doc that describes the indexing and handoff process that occurs - it can be really useful for deeeeeeeep troubleshooting because you can see what step a task got to and work back.
https://druid.apache.org/docs/latest/design/architecture.html#indexing-and-handoff

Peons have their own JVM properties: you can see there’s a setting called ‘druid.indexer.runner.javaOptsArray’ : https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-configuration

Take a look here for some detailed info:
https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#middlemanager

How is your box looking? Did the box itself start to run out of memory?