KafkaIngestion Pausing Problem

Hi,

One of my dataSource’s kafka lag in druid will suddenly go high and then goes down. It seems due to the ingestion task paused for 10 mins, not sure why.

the following is the log from peon


2019-07-11T23:21:27,190 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kafka_test-data_f56c406f4cf7b7c_jlbkjpdf]: CheckPointDataSourceMetadataAction{supervisorId='test-data', baseSequenceName='index_kafka_test-data_f56c406f4cf7b7c', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}}}}

2019-07-11T23:21:27,190 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_test-data_f56c406f4cf7b7c_jlbkjpdf] to overlord: [CheckPointDataSourceMetadataAction{supervisorId='test-data', baseSequenceName='index_kafka_test-data_f56c406f4cf7b7c', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}}}}].

2019-07-11T23:21:27,241 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed

2019-07-11T23:33:30,153 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed

2019-07-11T23:33:30,161 INFO [qtp1457899771-86] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Updating endOffsets from [{0=9223372036854775807, 1=9223372036854775807, 2=9223372036854775807, 3=9223372036854775807, 4=9223372036854775807, 5=9223372036854775807, 6=9223372036854775807, 7=9223372036854775807, 8=9223372036854775807, 9=9223372036854775807, 10=9223372036854775807, 11=9223372036854775807, 12=9223372036854775807, 13=9223372036854775807, 14=9223372036854775807, 15=9223372036854775807, 16=9223372036854775807, 17=9223372036854775807, 18=9223372036854775807, 19=9223372036854775807, 20=9223372036854775807, 21=9223372036854775807, 22=9223372036854775807, 23=9223372036854775807}] to [{0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}]

2019-07-11T23:33:30,161 INFO [qtp1457899771-86] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Persisting Sequences Metadata [[SequenceMetadata{sequenceName='index_kafka_test-data_f56c406f4cf7b7c_1', sequenceId=1, startOffsets={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}, endOffsets={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}, assignments=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], sentinel=false, checkpointed=true}]]

2019-07-11T23:33:30,161 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Ingestion loop resumed

```.

the following is the log from overlord

[KafkaSupervisor-test-data] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[test-data] failed to handle notice: {class=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor, exceptionType=class org.apache.druid.java.util.common.ISE, exceptionMessage=No such previous checkpoint [KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic=‘flow-feed-test-data-Xrvl’, partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}] found, noticeClass=CheckpointNotice}


Best

Hi, what version of Druid are you using?

Hi,

it is 0.13.1

Best

Li

Is this an expected problem of 0.13.0 version?