Is it possible to ingest from multiple hadoop clusters?

Is is possible to run a batch ingest on Hadoop cluster that is different than the Hadoop cluster that is configured by default with *-site.xml files for ingest and deep storage? The Kerberos principal and keytab for the default cluster are in the same realm that is used by this second cluster, so authentication shouldn’t be an issue. It is just not clear to me what nobs I need to turn in in the ingestion spec to override all of the proper configs, etc. to have druid know it is shelling this batch job out to a different cluster than it would by default.

Thanks!

Lucas

Hey Lucas,

You can use the “classpathPrefix” config (http://druid.io/docs/latest/ingestion/hadoop.html) to control on a per-task basis what *-site.xml files are used. This is probably the best way to do it.

Hi Gian,

Does the way that batch ingestion works that the mapreduce job writes the index files to deep store as a part of the job? If that is the case, we’d essentially be creating a second deep store, correct? Is it possible to configure the rest of the cluster to discern between the two and load segments from each or does this all seem like a bad idea?

My org is in the process of migrating from a hadoop 2.x cluster to a hadoop 3.x cluster and we were hoping that not everything would need to migrate at the same exact time time, and we could then index on two clusters simultaneously as the upstream source data migrated from one cluster to the next. However, the whole multiple deep stores that we’d be creating at the same time seems like more of a headache than it is worth (if I am understanding this all correctly).

Thanks,

Lucas

Does druid run stably on Hadoop 3.x?

在 2019年4月25日星期四 UTC+8上午5:11:25,capistr…@gmail.com写道: