Automate Hadoop Batch Ingestion

Hello Everyone,

I’m setting up Druid on an EMR cluster and successfully ingesting parquet files from an s3 bucket using Hadoop Batch Ingestion.

Now, I wanted to automate this ingestion process to run everyday for a different file (in the same s3 bucket) with the same configuration. Is this possible? Or would I need to manually do it everyday.



Hi Darshan:

You can POST API calls to Overlord periodically, probably configure a cron job, to submit tasks.

Only problem is though you need to figure out a way to tell Druid what are the new files in the folder, as you probably do not want to reingest the old file over and over again.