Druid Batch ingestion creating a wrong override on a previous segment

Hi Druid community,

I’m having an issue which airflow sending ingestion request to Druid then previous randomly segments will be impacted. Number of row for each impacted segment dropping a lot (from a few thousands to just 1 or 2 rows).
The other clue is dimension of impacted segments is not correct, they missed 1 or 2 dimensions meanwhile the ingestion payload also have.

Any suggestions would be really appreciated !

Thanks,
Nguyenh

Are you saying that ingestion is affecting segments that should not be impacted? I am not sure how that can happen. Are you sure your ingestion is not overwriting these segments?

Thanks, Vijeth_Sagar. So yes I’m pretty sure the previous segments be impacted. Here is my test, please help to review if my test and though correct or not.

  1. I send a ingestion for date 27
"ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "s3",
        "uris": null,
        "prefixes": [
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=01",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=09",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=10",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=11",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=12",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=13",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=15",
          "s3://datamart-prd/dm/ad_fact/year=2022/month=05/day=27/hour=22"
        ],
        "objects": null,
        "properties": null
      },

No intervals for granularitySpec

  "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "HOUR",
        "rollup": true,
        "intervals": null
      },

Then I go check a randomly partial_index_generic_merge and I found

"granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "HOUR",
        "rollup": true,
        "intervals": [
          "2022-05-27T22:00:00.000Z/2022-05-27T23:00:00.000Z",
          "2022-05-27T13:00:00.000Z/2022-05-27T14:00:00.000Z",
          "2022-05-27T11:00:00.000Z/2022-05-27T12:00:00.000Z",
          "2022-05-26T04:00:00.000Z/2022-05-26T05:00:00.000Z",
          "2022-05-27T05:00:00.000Z/2022-05-27T06:00:00.000Z",
          "2022-05-27T03:00:00.000Z/2022-05-27T04:00:00.000Z",
          "2022-05-26T16:00:00.000Z/2022-05-26T17:00:00.000Z",
          "2022-05-27T00:00:00.000Z/2022-05-27T01:00:00.000Z",
          "2022-05-27T14:00:00.000Z/2022-05-27T15:00:00.000Z",
          "2022-05-27T08:00:00.000Z/2022-05-27T09:00:00.000Z",
          "2022-05-27T15:00:00.000Z/2022-05-27T16:00:00.000Z",
          "2022-05-27T01:00:00.000Z/2022-05-27T02:00:00.000Z",
          "2022-05-27T09:00:00.000Z/2022-05-27T10:00:00.000Z",
          "2022-05-27T07:00:00.000Z/2022-05-27T08:00:00.000Z",
          "2022-05-27T10:00:00.000Z/2022-05-27T11:00:00.000Z",
          "2022-05-27T12:00:00.000Z/2022-05-27T13:00:00.000Z"
        ]
      },

This mean there is a list of some segments out of the ingestion list

"2022-05-26T04:00:00.000Z/2022-05-26T05:00:00.000Z",
"2022-05-27T05:00:00.000Z/2022-05-27T06:00:00.000Z",
"2022-05-27T03:00:00.000Z/2022-05-27T04:00:00.000Z",
"2022-05-26T16:00:00.000Z/2022-05-26T17:00:00.000Z",
"2022-05-27T00:00:00.000Z/2022-05-27T01:00:00.000Z",
"2022-05-27T14:00:00.000Z/2022-05-27T15:00:00.000Z",
"2022-05-27T08:00:00.000Z/2022-05-27T09:00:00.000Z",
"2022-05-27T07:00:00.000Z/2022-05-27T08:00:00.000Z"

My question is do the intervals generated which beyond the ingestion input is expected and do that create the wrong segments?
Thank you so much.
Nguyenh

We seem fixed the issue by defining intervals for granularitySpec
ref: https://druid.apache.org/docs/latest/tutorials/tutorial-ingestion-spec.html#define-an-interval-batch-only
We are monitoring the data to make sure our fix works but it good so far