Batch ingestion - Hadoop indexer vs indexing


I have been using druid for few weeks now and I have set up a batch ingestion job that uses the following config to read files from S3 and ingest data in druid. Now I am planning to move to production.

One question I have is - is the below way of ingestion recommended for production? I remember reading somewhere in doc that says this method is only for smaller file sizes and Hadoop index should be used for large file sizes and for production. The file sizes for us would be anywhere from 500MB - 4/5 GB.

Can someone please let me know if this is the right way to do the batch ingestion in production environment?

“type” : “index”,
“spec” : {
“ioConfig” : {
“type” : “index”,
“firehose” : {
“type” : “static-s3”,
“uris” : [
“<S3 file 1>”,
“<S3 file 2>”,
“<S3 file 3>”,


           "prefixes" : []


hii Sameer
you can see a good example here



Thanks for the response.

I have tried the way you suggested. But I am interested to know if there is a recommended way for batch ingestion. Is there is any advantage of using one way over another.