I am evaluating using Druid.
Looking at periodically indexing data from a sql table to druid.
I see the druid indexers use files as inputs (native, hadoop or spark). So I’m guessing I first need to schedule periodic data export from my table into files in say hdfs then submit an indexing task to Druid pointing to the files.
This however can easily get into a quite complicated workflow.
If I just have a simple cron job scheduling a data export and index Druid task say every hour for the data in the past hour, what happens when a certain druid indexing task fails? Does Druid automatically retries until that particular task succeeds?
What do people use for orchestrating workflows with Druid? Or does Druid do some of that work for us?