Basic Questions for Batch Ingestion using Hadoop

Hi everyone

We’re starting to move from the native Druid batch ingester to using Hadoop (Either AWS EMR or our own)

I’ve read this http://druid.io/docs/latest/ingestion/hadoop.html and this https://github.com/apache/incubator-druid/blob/master/docs/content/operations/other-hadoop.md but in none of these can I find the basic of information of how Druid know about the Hadoop cluster we are using. I was looking for an IP address or something like that, so Hadoop and Druid know about each other, but I’ve unfortunately not seen anything like that.

Have I missed it? Any help is appreciated.

Thanks and Regards

John

Hi, You would need to add your hadoop config files to Druid classpath.
Refer to “Configure Druid to use Hadoop” section in this tutorial - http://druid.io/docs/latest/tutorials/tutorial-batch-hadoop.html for details.