what is the difference between index_hadoop and HadoopDruidIndexer job

I see to be getting the index_hadoop job working, where I read input from hdfs file and

a mapreduce yarn application is launched and appropriate segments are created.

But then what is the purpose of HadoopDruidIndexer job vis-a-vis index_hadoop job.

Thanks,

-Vinay

http://druid.io/docs/latest/ingestion/batch-ingestion.html

Yep. Saw that. But when I fired an “index_hadoop” job I got a yarn application launched with input from hdfs file.

So I guess the question is why else would I want HadoopDruidIndexer other than to “avoid having to setup indexing service”

Is there any other benefit? pros/cons to HadoopDruidIndexer, which I am missing?

Thanks,

-Vinay

The two methods share mostly the same underlying code. Running the hadoop index task through the indexing service means the indexing service can lock on datasources and intervals, which helps if you are ingesting both batch and realtime data.