Restart of indexing tasks on middle manager upon restart of the host

Hi,

I am trying to cut back the cost of running Druid in my company. We run a 150 nodes Druid cluster with 80 nodes just for middle managers. Each middle manager handles 5 tasks.

I am wondering if some type of indexing tasks had some kind of checkpoint or ability to restart upon a host restarting. Having indexing tasks that are able to resume would allow us to use AWS spot instances and allow us to test the resilience of our infrastructure more often as loosing an instance would be a more common occurence.

The data comes from Kafka, is transformed by Spark Streaming and then sent to Druid using the Tranquility framework. We are open to change this flow if there is a way to make the middle manager tasks restart if the host restart.

Let me know if such feature exist or if you have an idea on how to implement this.

Thank you!

Hi Karim,

we use the same setup for our Druid cluster (AWS spot instances + kafka).
We use the Kafka indexing service provided by Druid.

http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html

You get offset management out of the box with this. Hope this helps.

Best wishes,

Stephan

One issue we’ve run into, is that large spot machines like r5d.2xlarge sometimes get taken away dozens of times per day, and sometimes a new one can’t even come up for 30 minutes due to not having spot availability. just a heads up.

Hey Michael,

Are you using spot on your entire cluster or only a portion of the historical/middle managers/brokers?

We were using it for our whole staging environment cluster, but it was causing so many headaches, we’ve switched back to ondemand.