We are using bigger cluster about 150 peons with 75 partitions. In general druid ingestion is good. But all of the sudden hand off slows down after couple of days and tasks getting piled up. This would cause the ingestion to slow down and eventually we see the exception in tranquility saying that “NoBrokerException” , i.e., none of the brokers are in discoverable mode. We loose entire ingestion after this.
We know that none of the systems in the world is ideal
My question is
How to detect the slowness of handover and how to stop some of the hanged tasks, either by shutdown them or by removing that middleManager node. Or in other words whats the best practises you suggest in this scenario. Overall goal is to stop complete ingestion eventually after couple of hours