Batch Ingestion Locking Error

Hi all,

When a lot of batch ingestion tasks are triggered, an error 'Cannot acquire a lock for interval[… ’ occurs. Before that exception, an error DETERMINE_PARTITIONS shows up.

Configuring options like druid.server.http.numThreads or druid.broker.http.numConnections did not help so far. From the error ouput, it is not clear what is happening (timeout …).

Here is the error log of such a failed task:

2019-08-20T06:56:05,490 WARN [main] org.apache.curator.retry.ExponentialBackoffRetry - maxRetries too large (30). Pinning to 29

2019-08-20T06:56:05,618 WARN [main] org.apache.druid.server.metrics.MonitorsConfig - Deprecated Monitor class name[com.metamx.metrics.JvmMonitor] found, please use name[org.apache.druid.java.util.metrics.JvmMonitor] instead.

2019-08-20T06:56:10,928 WARN [main] org.apache.druid.query.lookup.LookupReferencesManager - No lookups found for tier [__default], response [org.apache.druid.java.util.http.client.response.FullResponseHolder@766b6d02]

2019-08-20T06:56:12,934 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:

WARNING: A HTTP GET method, public void org.apache.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.

2019-08-20T06:56:14,331 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in DETERMINE_PARTITIONS.

java.lang.NullPointerException: Cannot acquire a lock for interval[2019-02-26T00:00:00.000Z/2019-02-28T00:00:00.000Z]

at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253) ~[guava-16.0.1.jar:?]

at org.apache.druid.indexing.common.task.Tasks.tryAcquireExclusiveLocks(Tasks.java:56) ~[druid-indexing-service-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.indexing.common.task.IndexTask.run(IndexTask.java:443) [druid-indexing-service-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.15.1-incubating.jar:0.15.1-incubating]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Finished peon task

Hints how to analyse the issue or what could be the cause would be very appreciated.

Thanks in advance,

Thomas

Hi Thomas:

Errors like “Cannot acquire a lock for interval[2019-02-26T00:00:00.000Z/2019-02-28T00:00:00.000Z]” usually indicate another ingestion covering whole or partial of the time interval is already happening, or has not finished, so it’s locked for writing from another source.

How many ingestions are on going at the time this error happens?

Thanks