Native ingestion task fails with 'Failed to add row with a timestamp' error when query granularity is changed

Hi,

  1. I have a datasource A in which I have existing ingested data with 'segment granularity’ as HOUR and 'query granularity’ as HOUR for few weeks**.**

  2. Next, when I create an ingestion task with 'segment granularity’ as DAY and 'query granularity’ as HOUR**,** this succeeds as well.

  3. Next, when I tried create an new ingestion task with 'segment granularity’ as DAY and 'query granularity’ as DAY. This ingestion task failed with the below error.

org.apache.druid.java.util.common.ISE: Failed to add a row with timestamp[2019-09-08T00:16:23.000Z] at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:1050) ~[druid-indexing-service-0.14.2-incubating.jar:0.14.2-incubating] at org.apache.druid.indexing.common.task.IndexTask.run(IndexTask.java:473) [druid-indexing-service-0.14.2-incubating.jar:0.14.2-incubating] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.14.2-incubating.jar:0.14.2-incubating] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.14.2-incubating.jar:0.14.2-incubating] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]

Is this expected behavior? Does Druid NOT allow to change query granularity of data for new data to be ingested, once the data source already has ingested data of different query granularity i.e. is it possible to ingest data of different query granularities in the same data source?

Regards,

Vinay

Hi Vinay, yes, it is possible to ingest data of different query granularity into the same data source, BUT, in different time intervals. In your case, looked like you were trying to add a row with timestamp[2019-09-08T00:16:23.000Z] , which probably already had segments in there with different granularity. Then that is not allowed.

Hope this helps.

Thanks Ming. I agree it should fail when segments are present in the interval. But in my case, the interval I am trying to add in does not have any segments. I cannot think of a reason why this should fail when there are no segments. Is there any other scenario, where this error can occur?

Regards,

Vinay

Hi Vinay:

This error usually points to some conflicts in metadata DB. If this is a brand new datasource / cluster, do you think you can start a metadata DB fresh? or Can you try ingesting to a new datasource name?

Hi Ming,

If I try ingesting into new datasource, it works. But we have an existing data source where we need to ingest into.

Is there a way to resolve the conflicts in metadata DB? Usually removing the segments for a day should clear out all the metadata in DB for that day right?

Regards,

Vinay Patil

1 Like