[druid-user] Data available in an active segment but not queryable


I wondered if anyone has previously observed similar issues and has any suggestions on next steps for debugging or resolving it.

The high level problem is that we have ingested data (via Kafka) into a data source but it doesn’t get returned when running SQL queries via the Druid router console. We are using monthly segments, so in this case I have a single segment (and partition) for Jan 2021, however a query shows no rows for November. e.g.

SELECT FLOOR(__time to MONTH), count(1)
FROM my_datasource
WHERE __time > TIMESTAMP ‘2020-10-01’

We are using version 0.20.0 and according to the Druid metadata, the segment in question has 1M rows, is published and is available. The historical node logs record the segment being pulled from deep storage, unzipped and announced.

I can query the segmentCache directly using the dump-segment command line tool, and am able to view data and the line count returned matches the number of rows in the Druid metadata.

I have also tried stopping the historical node, manually purging the segment from the cache and restarting the historical node and still no data.

Does anyone have any ideas what to try next?


Just adding a ping back to https://the-asf.slack.com/archives/CJ8D1JTB8/p1610731906065500 in case anyone finds this post in future :slight_smile:

Ahh thanks Peter, I had meant to go back and do that!