Druid query returns no results for existing segments

Relates to Apache Druid 0.20.1

We have segments from three days missing from the query results. The missing segments are shown in the web console and also in sys.segments table. The segments ‘used’ flags are set to True in the metadata db store. Also, we can see segements in the web console. The segments were also pulled from deep storage (S3) to the local segmentCache on historical nodes. But the data from the segments didn’t return from Druid query. And the ‘segmentMetadata’ query returned empty for the segments. We didn’t find any meaningful logs related to the issue. Does anyone know what’s going on here?

More background on this issue:

  1. We were using Kafka to ingest the data in to Druid
  2. Then we reindexed data for 3 days as we found incorrect data was ingested. To reindex data, we have marked all segments as unused for those 3 days and then ran kill job to permanently remove those segments.
  3. Then we reingested correct data through Kafka topic.
  4. Then we saw missing segments from query after those segments were re-ingested

Things I’ve tried

  1. Querying for rows in a given date time range
SELECT * FROM gps WHERE __time BETWEEN '2021-09-06' AND '2021-09-07'

Response: Query returned no data

  1. The historical node didn’t return the segments, the query was sent to historical node directly:
{
"queryType":"segmentMetadata",
"dataSource":"gps",
"intervals":["2021-09-06/2021-09-08"]
}

and it returned empty results.

  1. Applied drop retention rule for those 3 days and then reset drop rule and again reloaded data. Still we see no results on querying though the segments are loaded.

  2. Manually compacted segments for those 3 days, still no results on querying.

Assumptions

  • Could it be caused by segment version conflict?
  • Could it be with zookeeper/historical segment discovery issue?

Please help us here, this a critical production issue for us.

Thank you in advance!

Just to check… did you restart zookeeper or the coordinator process at any point?

Hi @Rachel_Pedreschi ,

Thank you so much for the quick response.

No I have not restarted zookeeper or the coordinator process during the process. Do you think restarting Druid cluster will help here ?

Thank you!

Sometimes in these situations I have seen in the past at least restarting the coordinator and / or zookeeper will realign everything. I can’t make any guarantees, since I am not clear about the root cause, but it would be the next thing I would try.

Hi @Rachel_Pedreschi ,

I have restarted coordinator and zookeeper, still it didn’t work. I have also restarted whole druid cluster, still its same. Do we need to enable any verbose logging to see whats happening during query execution ? Is this issue related to query node or historicals?

Thank you!

Hey Vikram! When you set the Drop rule, did you also mark those segments as used again afterwards?

Drop rules will mark segments as unused, so they will no longer be picked up by the coordinator process and sent to the historicals. You need to manually mark them as used again afterwards.

(There are also a tonne of interesting APIs on that page that you might use to dig in a bit…)