batch ingestion pipeline


I am trying to implement a hadoop batch ingestion pipeline on druid where druid will create ingestion task for any files that get added to the hdfs path provided in the spec file. Could anyone please point out if there is a configuration parameter in druid to indicate that the hadoop batch ingestion is for a recurring scenario, not a one-time batch ingestion. If such an option is not available, I am assuming I need to create a job to initiate curl modifying spec with the path to new files. Does this seem correct?

Thank you.


Hey Manasa,

There is currently no way to set up a recurring batch ingestion job within Druid and you’ll need to manage the periodic submission of jobs externally. Yes, having a script that modifies the ingestion spec and then submits it using curl is a good way to handle this. I’ve had success using Oozie as the scheduler ( although there are simpler solutions.