Batch Ingestion in AWS


I am very new to Druid and I have a general question about how Druid batch ingestion will be run in AWS.

Recently I was able to use the Hortonworks Tech Preview version of Druid to run an ingestion job in our dev Hadoop cluster. The job takes a HDFS file as input and runs MapReduce jobs to create segments. After that, we were able to run queries on the datasource. But if we set up a standalone Druid cluster in AWS, we will use S3 as deep storage and we will stand up instances running zookeeper and MySQL. But we would not have YARN or MapReduce to create segments. In that case, would Peon be responsible for creating the segments then? ( If not, what is the MapReduce equivalent job that creates the segments?

Thanks. I appreciate the info.



From the docs: