Just try to clarify what is the recommended way to do batch ingestion for Druid. Hadoop indexer vs. indexing service. We need to do batch load whenever we need to backfill a failed realtime indexing job (by Tranquility) or some late arriving messages. Our data size is about 60GB per hour.
I’m wondering that what is the recommended way of doing batch ingestion? I recall in earlier document, indexing service is only recommended for small data size (1GB), larger batch the Hadoop indexer is recommended. Now I read the document http://druid.io/docs/latest/tutorials/tutorial-loading-batch-data.html, it says “the
HadoopDruidIndexer still remains a valid option for batch ingestion, however, we recommend using the indexing service as the preferred method of getting batch data into Druid”.
I’d like to check that it is true that indexing service is preferred over hadoop regardless of data size, which makes sense to me (if you can do realtime indexing with indexing service, batch indexing should work with the same set of nodes)?