Druid Task Failed

I have pasted logs from overlord for an index task.

We recently seeing only 2 out of 10 tasks are succeeded.

Overlord logs:

2019-07-18T16:00:24,594 INFO [KafkaSupervisor-gitlfs-requests] io.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp', groupId='index_kafka_gitlfs-requests', taskResource=TaskResource{availabilityGroup='index_kafka_gitlfs-requests_cfaf871ab1cc12e', requiredCapacity=1}, dataSource='gitlfs-requests', context={checkpoints={"0":{"0":19585793,"1":19585795,"2":19585798}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}}
2019-07-18T16:00:24,596 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: MM-worker-host:8083, status 200 OK, response: {"task":"index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp"}
2019-07-18T16:00:24,605 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.RemoteTaskRunner - Worker[MM-worker-host:8083] wrote FAILED status for task [index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] on [TaskLocation{host='MM-worker-host', port=8100, tlsPort=-1}]
2019-07-18T16:00:24,605 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.RemoteTaskRunner - Worker[MM-worker-host:8083] completed task[index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] with status[FAILED]
2019-07-18T16:00:24,605 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp
2019-07-18T16:00:24,605 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] on worker[MM-worker-host:8083]
2019-07-18T16:00:24,612 WARN [Curator-PathChildrenCache-5] io.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp
2019-07-18T16:00:24,612 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp', groupId='index_kafka_gitlfs-requests', taskResource=TaskResource{availabilityGroup='index_kafka_gitlfs-requests_cfaf871ab1cc12e', requiredCapacity=1}, dataSource='gitlfs-requests', context={checkpoints={"0":{"0":19585793,"1":19585795,"2":19585798}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} (8199 run duration)
2019-07-18T16:00:24,612 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] status changed to [FAILED].
2019-07-18T16:00:24,612 INFO [Curator-PathChildrenCache-5] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] went bye bye.
io.druid.java.util.common.ISE: Unable to grant lock to inactive Task [index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp]
io.druid.java.util.common.ISE: Unable to grant lock to inactive Task [index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp]

Below are the last lines of MiddeleManager logs:

2019-07-18T16:00:22,259 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
2019-07-18T16:00:22,262 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProvidferFactory - Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”
2019-07-18T16:00:22,286 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.
2019-07-18T16:00:22,300 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@6ed18d80{/,null,AVAILABLE}
2019-07-18T16:00:22,311 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@230232b0{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
2019-07-18T16:00:22,311 INFO [main] org.eclipse.jetty.server.Server - Started @5909ms
2019-07-18T16:00:22,312 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@5b742bc8].
2019-07-18T16:00:22,316 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/http:MM-worker-host:8100]
2019-07-18T16:00:23,496 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.BaseAppenderatorDriver - New segment[gitlfs-requests_2019-07-18T15:00:00.000Z_2019-07-18T16:00:00.000Z_2019-07-18T05:00:13.662Z_10] for row[MapBasedInputRow{timestamp=2019-07-18T15:45:04.000Z, event={@version=1, timestamp_object=2019-07-18T15:45:04.000Z, timestamp=1563464704000, repo=sb-137-UNRSTD-1, timestamp_local=20190718154504, message=20190718154504|5|REQUEST|173.39.56.81|susubbia|GET|/sb-137-UNRSTD-1/objects/41/f5/41f5cbf9a934fcc3a771ce1a7861c66a6ded4dc4147537e37e2524ae60f7ca9d|HTTP/1.1|200|585845, protocol=HTTP/1.1, time=1563464704069069867, @timestamp=2019-07-18T15:45:05.328Z, type=gitlfs-request, env=prod, resource=/sb-137-UNRSTD-1/objects/41/f5/41f5cbf9a934fcc3a771ce1a7861c66a6ded4dc4147537e37e2524ae60f7ca9d, username=susubbia, resource_path=objects/41/f5/41f5cbf9a934fcc3a771ce1a7861c66a6ded4dc4147537e37e2524ae60f7ca9d, requesttype=REQUEST, clientip=173.39.56.81, bytes=585845, duration=5, statuscode=200, resource_name=41f5cbf9a934fcc3a771ce1a7861c66a6ded4dc4147537e37e2524ae60f7ca9d, site=BGL, method=GET, resource_type=objects/41/f5/41f5cbf9a934fcc3a771ce1a7861c66a6ded4dc4147537e37e2524ae60f7ca9d, host=bgl-gitlfs-prd2.cisco.com}, dimensions=[site, env, host, method, statuscode, bytes, duration, resource_type, repo, clientip, timestamp, username]}] sequenceName[index_kafka_gitlfs-requests_cfaf871ab1cc12e_0].
2019-07-18T16:00:23,575 INFO [task-runner-0-priority-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[gitlfs-requests_2019-07-18T15:00:00.000Z_2019-07-18T16:00:00.000Z_2019-07-18T05:00:13.662Z_10] at new path[/druid/segments/MM-worker-host:8100/MM-worker-host:8100_indexer-executor__default_tier_2019-07-18T16:00:23.573Z_b8f4714443ae41668cbd5e230fee70a40]
2019-07-18T16:00:23,598 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp]: SegmentAllocateAction{dataSource=‘gitlfs-requests’, timestamp=2019-07-18T17:45:04.000Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName=‘index_kafka_gitlfs-requests_cfaf871ab1cc12e_0’, previousSegmentId=‘gitlfs-requests_2019-07-18T15:00:00.000Z_2019-07-18T16:00:00.000Z_2019-07-18T05:00:13.662Z_10’, skipSegmentLineageCheck=‘true’}
2019-07-18T16:00:23,599 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_gitlfs-requests_cfaf871ab1cc12e_klockfdp] to overlord: [SegmentAllocateAction{dataSource=‘gitlfs-requests’, timestamp=2019-07-18T17:45:04.000Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName=‘index_kafka_gitlfs-requests_cfaf871ab1cc12e_0’, previousSegmentId=‘gitlfs-requests_2019-07-18T15:00:00.000Z_2019-07-18T16:00:00.000Z_2019-07-18T05:00:13.662Z_10’, skipSegmentLineageCheck=‘true’}].


Datasource Rules



Hi Chitra:

I do not see the indexing task log here. Can you re-attach?

Thanks

Hi Ming,

Please find the indexer (one of failed) task log (attached)

Chitra

arti-failed.log (335 KB)

Hi Chitra,

the ingestion task log is spammed with error Still waiting for Handoff for Segments , which indicates Peon is not able to finish publishing the segments, either to deep storage or from deep storage to historicals. The Coordinator and historical logs shall have information why the publishings are failing.

Thanks