[druid-user] Druid-0.10.1 upgrade: Lag metric: Kafka partitions [......] do not match task partitions [......]

Hey Jan,

The lag measurement is a new feature in 0.10.1 and that log message is new too.

I bet the WARNing message is happening because of the timeouts fetching current offsets from the tasks. And that in turn could be happening because the tasks’ http servers, for some reason, are not being responsive. Do the tasks seem to be running ok and answering queries ok?

It could be the 5% of tasks that are failing are also running for a while before they fail and gumming up the lag measurement works. Or it could be that the 95% of tasks that succeed are still, for some reason, often unresponsive to queries. Thread dumping them might shed some light (jstack -l on the peon pid).