Indexing tasks stuck waiting for lock

Hi all,

So we have a hybrid batch/streaming architecture in our pipeline. The batch ingesting tasks are scheduled to be run every 8 hours. The datasource is being used only by the batch and realtime ingestion tasks. The realtime ingestion has an HOURLY segment granularity while batch ingestion has DAY segment granularity.

The problem we’re facing is that the batch ingestion seems to be stuck waiting for a lock. Let’s say realtime ingestion task will be running for the interval 2018-06-04T21:00:00.000Z/2018-06-04T22:00:00.000Z and batch ingestion wants to run for the interval 2018-06-04T00:00:00.000Z/2018-06-04T21:00:00.000Z. The intervals don’t seem to be overlapping. Is it because the batch ingestion is trying to lock down the segment for the whole day and fails to do so because the realtime task is writing to before mentioned interval?

We seem to be able to overcome this by changing the batch ingestion segment granularity to HOURLY but would like to keep the batch ingestion segment granularity at DAY because of the recommendation that segment sizes should be between 300 and 700mb.

Any help would be much appreciated. Thanks!

Hi Blesson,

Yes, it’s because the batch job is trying to lock the whole day (based on its DAY grain). It does this because it would overwrite data for the entire day, by its design. I think in your case the best option is to wait until the day is over before you run your batch task, if that’s feasible. If not, then it may make sense to run both at HOUR level.