0.12.3- Still waiting for Handoff for Segments

Hello,

We have recently upgraded to 0.12.3 and we tested data ingestion using Kafka indexing service. We tested to create 2 data sources(data is loaded from the same Kafka topic) and Strangely segments are created for only one data source.We see segments created in metadata storage(mysql) but they are automatically disabled for hand-offs(used=0) with the tasks status updated as success.Both of them have indexing specs but after looking at the logs for ingestion tasks, we see bunch of different errors.

  • Tasks are successful but the data sources was never created.

2018-12-06T19:13:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:13:36,872 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:14:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:14:36,873 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:15:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:15:36,873 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:16:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:16:36,874 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:17:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:17:36,873 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:18:36,864 INFO [coordinator_handoff_scheduled_0] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://stage1:8081

2018-12-06T19:18:36,872 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2018-12-06T18:00:00.000Z/2018-12-06T19:00:00.000Z, version=‘2018-12-06T18:00:03.308Z’, partitionNumber=2}]]

2018-12-06T19:18:50,620 INFO [qtp644399750-97] io.druid.indexing.kafka.KafkaIndexTask - Stopping gracefully (status: [PUBLISHING])

2018-12-06T19:18:50,626 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down immediately…

2018-12-06T19:18:50,629 INFO [task-runner-0-priority-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[test-events_2018-12-06T16:00:00.000Z_2018-12-06T17:00:00.000Z_2018-12-06T16:00:02.286Z_2] at path[/druid/segments/stage2:8101/stage2:8101_indexer-executor__default_tier_2018-12-06T16:33:40.503Z_08cabd499057488e809195ad97a9f3440]

2018-12-06T19:18:50,629 INFO [task-runner-0-priority-0] io.druid.curator.announcement.Announcer - unannouncing [/druid/segments/stage2:8101/stage2:8101_indexer-executor__default_tier_2018-12-06T16:33:40.503Z_08cabd499057488e809195ad97a9f3440]

2018-12-06T19:18:50,645 INFO [task-runner-0-priority-0] io.druid.indexing.kafka.KafkaIndexTask - The task was asked to stop before completing

2018-12-06T19:18:50,646 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_kafka_test-events_6531cb9f81ff4b1_odcdcopo]

  • Never seen this authorization error before

Dec 06 15:50:02 stage1 druid[41084]: 2018-12-06T15:50:02,776 ERROR [qtp1010311355-127] io.druid.server.security.PreResponseAuthorizationCheckFilter - Request did not have an authorization check performed.: {class=io.druid.server.security.PreResponseAuthorizationCheckFilter, uri=/, method=GET, remoteAddr=00.000.000.00, remoteHost=00.000.000.00}

Dec 06 15:50:02 stage1 druid[41084]: 2018-12-06T15:50:02,777 WARN [qtp1010311355-127] org.eclipse.jetty.server.HttpChannel - //stage1:8081/

Dec 06 15:50:02 stage1 druid[41084]: io.druid.java.util.common.ISE: Request did not have an authorization check performed.

Dec 06 15:50:02 stage1 druid[41084]: at io.druid.server.security.PreResponseAuthorizationCheckFilter.handleAuthorizationCheckError(PreResponseAuthorizationCheckFilter.java:158) ~[druid-server-0.12.3.jar:0.12.3]

  • The only difference i see between both the task logs are task which could not hand off segments has these missing :
    2018-12-06T19:25:34,449 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kafka_test-abc-supervisor-manual_8d148cfab1b48e6_jhnbhacp]: SegmentAllocateAction{dataSource=‘test-abc-supervisor-manual’, timestamp=2018-12-06T19:25:34.418Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName=‘index_kafka_test-abc-supervisor-manual_8d148cfab1b48e6_0’, previousSegmentId=‘null’, skipSegmentLineageCheck=‘true’}

2018-12-06T19:25:34,458 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_test-abc-supervisor-manual_8d148cfab1b48e6_jhnbhacp] to overlord: [SegmentAllocateAction{dataSource=‘test-abc-supervisor-manual’, timestamp=2018-12-06T19:25:34.418Z, queryGranularity=NoneGranularity, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName=‘index_kafka_test-abc-supervisor-manual_8d148cfab1b48e6_0’, previousSegmentId=‘null’, skipSegmentLineageCheck=‘true’}].

Can you help me where to debug this issue ?

Spec:

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “test-supervisor-manual”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “timestamp”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [

“site”,

“category”,

“name”,

“env”,

“type”,

“userid”,

“duration”,

“folder”,

“jobName”,

“buildId”,

“waitTime”,

“execNode”,

“causes”,

“startTime”,

“team”,

“status”,

“uri”,

“timestamp”

],

“dimensionExclusions”: ,

“spatialDimensions”:

}

}

},

“metricsSpec”: [

{

“type”: “count”,

“name”: “count”

},

{

“type”: “longSum”,

“name”: “durationSum”,

“fieldName”: “duration”

},

{

“type”: “longSum”,

“name”: “waitTimeSum”,

“fieldName”: “waitTime”

},

{

“type”: “hyperUnique”,

“name”: “uniqueJobs”,

“fieldName”: “jobName”

},

{

“type”: “hyperUnique”,

“name”: “uniqueUsers”,

“fieldName”: “userid”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “NONE”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsInMemory”: 500000,

“intermediatePersistPeriod”: “PT15m”

},

“ioConfig”: {

“topic”: “*****”,

“consumerProperties”: {

“bootstrap.servers”: " ***** "

},

“taskCount”: 1,

“replicas”: 1,

“taskDuration”: “PT10M”

}

}

Hi,

is future data (or too old data) being ingested into Kafka by any chance? If generated segments are not applied to loadRule (applied to dropRule), they never get loaded in historicals. As a result, kafka tasks should wait forever and finally the supervisor kills them.

https://github.com/apache/incubator-druid/issues/5868 is to fix this issue and will be fixed in 0.13.1.

Jihoon

same issue, anyone knows that?

naveen kavuri naveenrt23@gmail.com 于2018年12月7日周五 上午6:27写道:

Hey,

The data shouldn’t be too old or future data as the 2nd data-source is completely functional with the same data. It’s literally same data, same topics and same specs but with different data source names. I have verified all the properties and specs and as i mentioned above, there is only one change in the logs as well.

It looks like segments are created and coordinator is not notified about the segments and hence they are dropped without loading into historical.

Thanks !