Schedule batch loading for new data from S3


We are evaluating Druid to load data from S3 and provide analytic querying capability on this data. We have years worth of data in TBs
Our realtime pipeline processes and stores the processed data in S3. We need Druid to pull historical & the new data ingested into S3 constantly? I believe this could be done in Druid (without having Real time nodes streaming data)

Best Regards


I’m also using Druid with batches as “real-time”

To achieve that, we use a nifi cluster that build and post ingestion spec every 2 minutes (but we are sure that no ingestion will run for more than 2 minutes)

You could use any tool that is able to post spec over http.

I don’t think Druid offers another way to do it