Still waiting for Handoff for Segments - TASK FAILED

Hello,

We are using druid 0.12.3 version with Kafka supervisor function.

I regularly observe that the ingestion tasks are failing without any error with the logs ending with :

  1. io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments

  2. io.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment(Segments ends up in metadata storage but the task is still failing)
    2019-01-22T20:54:13,253 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.BaseAppenderatorDriver - New segment[test-requests_2019-01-23T01:00:00.000Z_2019-01-23T02:00:00.000Z_2019-01-22T20:10:03.681Z_2] for row[MapBasedInputRow{timestamp=2019-01-23T01:43:58.000Z, event={@version=1, method=GET, statuscode=200, type=test-request, resource=/api/docker/, repo=api, host=test-0, timestamp_local=20190122204358, requesttype=REQUEST, duration=152, resource_type=0-SNAPSHOT, resource_name=0-SNAPSHOT, clientip=00.000.000, resource_path=docker, username=gen, timestamp=1548207838000, message=20190122, site=DMZ, bytes=2626, timestamp_object=2019-01-23T01:43:58.000Z, @timestamp=2019-01-22T20:54:12.967Z, env=prod, protocol=HTTP/1.1}, dimensions=[site, env, host, method, statuscode, bytes, duration, resource_type, repo, clientip, timestamp, username]}] sequenceName[index_kafka_test-requests_5480313f5326cd2_0].
    2019-01-22T20:54:13,291 INFO [task-runner-0-priority-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[test-requests_2019-01-23T01:00:00.000Z_2019-01-23T02:00:00.000Z_2019-01-22T20:10:03.681Z_2] at new path[/druid/segments/localhost:8101_indexer-executor__default_tier_2019-01-22T20:54:13.290Z_db7578459b8f493ea90d97b02e19af1f0]

END of LOGS*********************************************************

Any idea why this would happen ?.

Thanks !

Sounds like Coordinator failed to load the segments successfully from deep storage, and assign to historical, thus handoff signal was never sent. Can you check your coordinator log and find which historical nodes is acting up?

Ming

Our historicals have enough storage capacity and i did not find any errors in coordinator logs. How else can we track this issue ?

Not having enough storage might be one of the reasons the handoff could not finish. For details, it’s still better to check your coordinator and historical logs to find out.

Thanks

It looks like the announced segment is in the future (it starts at 2019-01-23T01:00:00.000Z and the timestamp is 2019-01-22T20:54:13,291). I’m not an expert in realtime (I’ve only used Kafka Indexing Service) but with KIS there can be a problem when Druid creates segments in the future which are not configured to be loaded by historicals.
See https://github.com/apache/incubator-druid/issues/5868 and https://github.com/apache/incubator-druid/issues/5869

The underlying issue will be fixed in the next version of Druid (see https://github.com/apache/incubator-druid/pull/6676)

Until then you will want to make sure that the rejection periods of your ingestion is configured to be a subset of your load rules.

–dave

We have not configured load rules and is set to default. What do you mean by announced segments in future ?

Thanks !

If you receive data from the future, by default the ingestion task will just create a segment in the future; if you’ve configured your historicals to not load future data that will cause problems (though this will be fixed in the next release). I’m not positive what the default load rules are for data sources — can you share a screenshot of what the “datasources” tab in your coordinator UI looks like?