We are using Kafka Indexing Service to ingest data into Druid . While we don’t see any error in the Coordinator ingestion tasks, we noticed important incoherences compared with the expected numbers. We are still investigating this, but I was hopping you can guide us.
We have in the table druid_pendingSegments a large number of records (12775) , most of them with an old creation date. In table druid_segments we have 16527 records (14380 used).
From our understanding, the pendingSegments records are related only to a running job, but it looks like they are never deleted in our case. Moreover, while every record from segments table have a match in the pendingSegments table, there are 32 “orphan” records, only visible in pendingSegments table.
Can you please clarify the workflow and eventually suggest some investigation approach?
Thanks for your time!
Excerpt from the ingestion task