Hi,
One of my dataSource’s kafka lag in druid will suddenly go high and then goes down. It seems due to the ingestion task paused for 10 mins, not sure why.
the following is the log from peon
2019-07-11T23:21:27,190 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kafka_test-data_f56c406f4cf7b7c_jlbkjpdf]: CheckPointDataSourceMetadataAction{supervisorId='test-data', baseSequenceName='index_kafka_test-data_f56c406f4cf7b7c', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}}}}
2019-07-11T23:21:27,190 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_test-data_f56c406f4cf7b7c_jlbkjpdf] to overlord: [CheckPointDataSourceMetadataAction{supervisorId='test-data', baseSequenceName='index_kafka_test-data_f56c406f4cf7b7c', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}, currentCheckPoint=KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic='flow-feed-test-data-Xrvl', partitionOffsetMap={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}}}}].
2019-07-11T23:21:27,241 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed
2019-07-11T23:33:30,153 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Pausing ingestion until resumed
2019-07-11T23:33:30,161 INFO [qtp1457899771-86] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Updating endOffsets from [{0=9223372036854775807, 1=9223372036854775807, 2=9223372036854775807, 3=9223372036854775807, 4=9223372036854775807, 5=9223372036854775807, 6=9223372036854775807, 7=9223372036854775807, 8=9223372036854775807, 9=9223372036854775807, 10=9223372036854775807, 11=9223372036854775807, 12=9223372036854775807, 13=9223372036854775807, 14=9223372036854775807, 15=9223372036854775807, 16=9223372036854775807, 17=9223372036854775807, 18=9223372036854775807, 19=9223372036854775807, 20=9223372036854775807, 21=9223372036854775807, 22=9223372036854775807, 23=9223372036854775807}] to [{0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}]
2019-07-11T23:33:30,161 INFO [qtp1457899771-86] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Persisting Sequences Metadata [[SequenceMetadata{sequenceName='index_kafka_test-data_f56c406f4cf7b7c_1', sequenceId=1, startOffsets={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}, endOffsets={0=5462451, 1=10360911, 2=18082707, 3=3062844, 4=10254489, 5=14499272, 6=11896632, 7=13760201, 8=12311302, 9=12258729, 10=13831300, 11=5806819, 12=10051901, 13=8380105, 14=5215236, 15=14438739, 16=7217805, 17=10020374, 18=9769071, 19=11887306, 20=7942729, 21=7708916, 22=7674621, 23=13316842}, assignments=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], sentinel=false, checkpointed=true}]]
2019-07-11T23:33:30,161 INFO [task-runner-0-priority-0] org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner - Ingestion loop resumed
```.
the following is the log from overlord
[KafkaSupervisor-test-data] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[test-data] failed to handle notice: {class=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor, exceptionType=class org.apache.druid.java.util.common.ISE, exceptionMessage=No such previous checkpoint [KafkaDataSourceMetadata{kafkaPartitions=KafkaPartitions{topic=‘flow-feed-test-data-Xrvl’, partitionOffsetMap={0=5457176, 1=10349590, 2=18065601, 3=3059731, 4=10244223, 5=14485522, 6=11884382, 7=13746156, 8=12298452, 9=12247495, 10=13818037, 11=5801670, 12=10041976, 13=8371497, 14=5210036, 15=14424694, 16=7210935, 17=10010442, 18=9759696, 19=11875782, 20=7935150, 21=7700864, 22=7666986, 23=13304020}}}] found, noticeClass=CheckpointNotice}
Best