Can't do indexing on remote Hadoop server

Hi,
I’m trying to do this example http://druid.io/docs/latest/tutorials/tutorial-batch-hadoop.html just instead of dockerized Hadoop I’m using remote one. Unfortunately with no luck. I’ve tried to change plenty of settings, but nothing seems to work. Please find attached all properties, log and json file. Any help would be really appreciated.

broker_runtime.properties (410 Bytes)

common.runtime.properties (4.06 KB)

coordinator_runtime.properties (113 Bytes)

historical_runtime.properties (350 Bytes)

log.txt (605 KB)

middleManager_runtime.properties (151 Bytes)

middleManager_runtime.properties (151 Bytes)

wikipedia-index-hadoop.json (1.75 KB)

P.S. Remote Hadoop is Hortonworks and druid-0.12.2.

2018 m. rugsėjis 18 d., antradienis 11:25:59 UTC+2, Julius Rachmanas rašė:

Hi Julius,

It seems the job is running using LocalJobRunner instead of running on actual Hadoop cluster.

For making druid working with remote hadoop, you will need to add your MR config files to the middlemanager classpath.

Hi,
Ok. Then I misunderstood something or to put it in other words, what I’m trying to achieve is use data files from HDFS (*.csv.gz), but do indexing internaly in DRUID (I don’t want to put any of files Hadoop cluster and keep it isolated from DRUID). Then in this case what settings should I set?

Cheers,

Julius

2018 m. rugsėjis 18 d., antradienis 15:21:09 UTC+2, Nishant Bangarwa rašė: