Empty result query for some segments


We’ve been running a Druid cluster in production for a few years and recently started re-ingesting the same data for new datasources with minor pre-agreggation adjustments (as compared to original datasource).
The issue is that from time to time ingested data (which at this point became segments) returns empty results when queried. The interesting part is that if we re-ingest that data again (after marking it unused and issuing a kill job), we can usually see the expected output. However, for some date ranges re-running the task doesn’t help.

I’ve checked the logs on both historicals and coordinator and everything seems to be correct. For example, this is what I see on historical for segment in question (which still returns no data):

2022-12-21T16:25:53,309 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment my_data_source.v1_2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z_2022-12-21T15:55:26.116Z_121                                                                                                                                                                 
2022-12-21T16:25:53,309 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[CloudObjectLocation{bucket='deep-storage-bucket', path='druid/segments/my_data_source.v1/2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z/2022-12-21T15:55:26.116Z/121/index.zip'}] to outDir[/druid/data-1/segments/my_data_source.v1/2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z/2022-12-21T15:55:26.116Z/121]
2022-12-21T16:25:54,743 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Loaded 172295182 bytes from [CloudObjectLocation{bucket='deep-storage-bucket', path='druid/segments/my_data_source.v1/2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z/2022-12-21T15:55:26.116Z/121/index.zip'}] to [/druid/data-1/segments/my_data_source.v1/2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z/2022-12-21T15:55:26.116Z/121]
2022-12-21T16:25:54,755 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[my_data_source.v1_2021-10-02T05:00:00.000Z_2021-10-03T05:00:00.000Z_2022-12-21T15:55:26.116Z_121] at existing path[/druid/segments/

Querying segment metadata for this interval returns relevant data.

With that info in mind, I guess I have 2 questions:

  1. Why would what seems like a successful ingestion task sometimes results in an empty query results, but re-ingesting that interval would usually fix the problem?
  2. Theoretically, why would we see properly ingested segments (based on segment metadata, its presence in deep storage, normal operational logs), but get no data back?

Thank you.

Hello @mellk,
Welcome to the Druid Forum. I’m not sure what would cause that, it seems very odd to me.
Could you share the query you are using to test with? perhaps that will shed some light.


can you attach the broker run time properties?