Replicant create queue still has 10 segments and gets stuck after 15+ runs

Hey, I added third historical node, and kept replication factor at 2, rebalancing finished successfully, but now I’m still getting these errors :

2017-11-01T15:01:42,552 ERROR [Coordinator-Exec–0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue stuck after 15+ runs!: {class=io.druid.server.coordinator.ReplicationThrottler, segments=[gwiq-p_2017-10-28T14:00:00.000Z_2017-10-28T15:00:00.000Z_2017-10-28T17:36:32.581Z ON, gwiq-p_2017-10-28T09:00:00.000Z_2017-10-28T10:00:00.000Z_2017-10-28T12:33:44.415Z ON, gwiq-p_2017-10-28T12:00:00.000Z_2017-10-28T13:00:00.000Z_2017-10-28T15:35:45.717Z ON, gwiq-p_2017-10-28T06:00:00.000Z_2017-10-28T07:00:00.000Z_2017-10-28T09:34:44.547Z ON, gwiq-p_2017-10-28T05:00:00.000Z_2017-10-28T06:00:00.000Z_2017-10-28T08:30:09.589Z ON, gwiq-p_2017-10-28T07:00:00.000Z_2017-10-28T08:00:00.000Z_2017-10-28T10:34:23.012Z ON, gwiq-p_2017-10-28T02:00:00.000Z_2017-10-28T03:00:00.000Z_2017-10-28T05:27:34.225Z ON, gwiq-p_2017-10-28T04:00:00.000Z_2017-10-28T05:00:00.000Z_2017-10-28T07:28:56.744Z ON, gwiq-p_2017-10-28T13:00:00.000Z_2017-10-28T14:00:00.000Z_2017-10-28T16:36:35.438Z ON, gwiq-p_2017-10-28T01:00:00.000Z_2017-10-28T02:00:00.000Z_2017-10-28T04:28:19.409Z ON]}

Segments seem to be dropped and assigned like in a live lock. We have 25 000 of segments, which is too much, I currently try to reduce it by druid.coordinator.merge.on

But until then I would need to solve this, is there a coordinator configuration that would prevent this live lock caused by too many segments?

Which version are you on? Later versions have a lot of improvements related to better handling larger numbers of segments. So if you aren’t on the latest I would try upgrading and see if that helps.

We use 0.10.1, but it is the first time since we have more historicals than replication factor. So the problem is probably there.