Kafka Indexing Service task successes for supervisors started under 0.12.3 vs 0.13.0

We’re seeing a difference in the Completed Task listings in the Overlord UI for Kafka Indexing Service tasks between our QA and production clusters.

In QA, the KIS supervisors were started under 0.12.3 and the cluster has recently been upgraded to 0.13.0

In production, the KIS supervisors were started under 0.13.0

In QA ioConfig.taskDuration = PT1H

In Production ioConfig.taskDuration = PT12H and tuningConfig.maxRowsPerSegment is also set

In QA we are seeing new Successful Completed Tasks every hour.

In Production we aren’t seeing any Successful Completed Tasks (no failures either), BUT when I check the datasource segment shards, I see that segment shards have been closing off and persisting to storage.

I also see from the metadata on the running tasks, that the current task started 12H after the supervisor was first brought online. This leads me to think that everything is working in production, but I don’t know why we’re not seeing any entries in the Completed Tasks lists.

Am I missing a difference in behaviour when maxRowsPerSegment is set?



Hey Dyana,

You should be seeing tasks complete every 12 hours in production, based on your configuration.

I wonder if the most recent completed tasks are too old to show up? There’s a config druid.indexer.storage.recentlyFinishedThreshold, default PT24H, and any tasks created more than that amount of time ago won’t show up when completed. (This is to prevent the completed tasks list from getting too long.)

One other investigation option is to write down the task IDs that are running now, and then after they finish, check the task status/log APIs for that particular task to see if they show up: /druid/indexer/v1/task/TASKID/status and /druid/indexer/v1/task/TASKID/log. These APIs don’t fall under the recentlyFinishedThreshold and they will always show relevant info.