This tutorial describes two ways of loading batch data: indexing service and HadoopDruidIndexer (I think it should be CliInternalHadoopIndexer in druid 0.8.2).
It also describes advantages of different methods.
My situation is I only need to ingest batch data. And I am now running CliInternalHadoopIndexer on our Hadoop Cluster.
Configuring indexing services could be a challenge because we need to ingest hundreds of segments all at once sometimes.
However, in a later tutorial, we found “The HadoopDruidIndexer still remains a valid option for batch ingestion, however, we recommend using the indexing service as the preferred method of getting batch data into Druid.”
My question is:
Why does druid recommend using indexing service (which we never use previously)?
Do you plan to disable indexing on Hadoop cluster in later releases?