How to submit ingestion job to another yarn cluster?

Hi

Below is my situation.

I’m in team A, managing druid cluster + HDFS(deep storage)

Another team B has their own Yarn + Hdfs cluster.

Team A’s cluster has relative small cluster than B.

And team B want ingestion their data into team A’s druid cluster.

But most important data is on team B’s hdfs cluster and huge.

it would be very expensive if import data from B’s hdfs into A’s deep storage every day.

And it’s hard to expand A’s hdfs cluster for right now.

So i’m searching the way of submiting job to another yarn cluster and let them do ingestion job

after finish ingestion import only segment data into A’s Hdfs cluster(Deep storage)

How can i do this?

Can i submit ingestion job another yarn cluster?

I wonder is there any way of signaling coordinator to load segment after copied segments from another hdfs cluster

Ex) hadoop distcp another_cluster /user/druid/segment/some_data_source

After distcp signal coordinator to load.

Thanks! Have a nice day

Hi,
https://github.com/druid-io/druid/pull/4626 might be helpful for you. I think it will be included in 0.11.

Jihoon

2017년 8월 22일 (화) 오후 7:06, 기준 0ctopus13prime@gmail.com님이 작성:

Thanks!!! It’s cool!

Another questions.

Q1.

But it seems ingestion job fails before publish segment.

From conversation in you’re links says

but it fails on step to publish segment ...

Any ideas?

Q2.

I just added property “yarn.resourcemanager.address”.

After your patch any other properties should be added?

Thanks a lot.

Always, It’s nice talking to you! jihoon!

2017년 8월 22일 화요일 오후 7시 6분 41초 UTC+9, 기준 님의 말:

For more information,
Yarn cluster that added to old ingestion configuration does not have nothings to do druid.

It does not have any jar files related to druid, it just pure yarn + hdfs cluster.

2017년 8월 22일 화요일 오후 7시 6분 41초 UTC+9, 기준 님의 말:

Nice to talk with you too. :slight_smile:

Q1) The current problem is indexing tasks fail while publishing segments if they are submitted to Hadoop clusters other than the configured one. The patch linked above fixes that problem.

Q2) I think that’s the only one to submit tasks to multiple hadoop clusters. I could also submit indexing tasks to different clusters by adding only that one.

Jihoon

2017년 8월 22일 (화) 오후 11:18, 기준 0ctopus13prime@gmail.com님이 작성:

I added “yarn.resourcemanager.address”, exceptions has been thrown even before indexing (but job is accepted by yarn).

I did not applied patch.

According to conversation in your link, indexing went well, and segment publishing is failed.

But in my case job failed even before indexing.

Here is my log. I could not found any particular reasons.

“”"

2017-08-24T01:19:45,456 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1502083341961_0894 running in uber mode : false

2017-08-24T01:19:45,458 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - map 0% reduce 0%

2017-08-24T01:19:45,469 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1502083341961_0894 failed with state FAILED due to: Application application_1502083341961_0894 failed 2 times due to AM Container for appattempt_1502083341961_0894_000002 exited with exitCode: 1

For more detailed output, check application tracking page:http://abcdef:12345/proxy/application_1502083341961_0894/Then, click on links to logs of each attempt.

Diagnostics: Exception from container-launch.

Container id: container_1502083341961_0894_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)

at org.apache.hadoop.util.Shell.run(Shell.java:504)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Failing this attempt. Failing the application.

2017-08-24T01:19:45,522 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 0

2017-08-24T01:19:45,524 ERROR [task-runner-0-priority-0] io.druid.indexer.DeterminePartitionsJob - Job failed: job_1502083341961_0894

2017-08-24T01:19:45,524 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/druid-test/2017-08-24T011926.960Z_f0d623261ac9446eac14d9eec3fabaf0]

2017-08-24T01:19:45,548 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=druid-test_2017-08-24T01:19:26.974Z, type=index_hadoop, dataSource=druid-test}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:211) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:176) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]

… 7 more

Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.DeterminePartitionsJob] failed!

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:306) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]

… 7 more

2017-08-24T01:19:45,557 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [druid-test_2017-08-24T01:19:26.974Z] status changed to [FAILED].

2017-08-24T01:19:45,560 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “druid-test_2017-08-24T01:19:26.974Z",

“status” : “FAILED”,

“duration” : 12463

}

“”"

2017년 8월 22일 화요일 오후 7시 6분 41초 UTC+9, 기준 님의 말:

hmm, can you find anything weird in yarn application logs?

2017년 8월 24일 (목) 오전 10:29, 기준 0ctopus13prime@gmail.com님이 작성:

No…

Here is yarn log.

“”"

Diagnostics: Exception from container-launch.

Container id: container_1502083341961_0898_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)

at org.apache.hadoop.util.Shell.run(Shell.java:504)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Failing this attempt. Failing the application.

“”"

2017년 8월 22일 화요일 오후 7시 6분 41초 UTC+9, 기준 님의 말: