Loading native snappy in middleManager/peon

Hi all!

I’m trying to set up Druid with Hadoop 2.6.0 CDH 5.5.2. After wading through and working around various dependency issues, I’m hitting a wall. Our default compression codec is snappy. We force the Hadoop java processes to load the native snappy libraries by setting LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native. I had gotten an error about not being able to load snappy in the middleManager earlier this week, but got around it by setting LD_LIBRARY_PATH in the middleManager’s environment. Now I’m to the point where an indexing task successfully completes a MapReduce job and writes out snappy files in /tmp/druid-indexing. The MapReduce job finishes fine. After it does, it looks like a Peon process attempts to read what was written. While doing so, I get a snappy loading error:

2016-05-20T20:35:27,591 INFO io.druid.indexer.DetermineHashedPartitionsJob: Job completed, loading up partitions for intervals[Optional.of([2015-09-01T00:00:00.000Z/2015-09-02T00:00:00.000Z])].

2016-05-20T20:35:27,643 ERROR io.druid.indexing.overlord.ThreadPoolTaskRunner: Exception while running task[HadoopIndexTask{id=index_hadoop_pageviews_2016-05-20T20:33:38.361Z, type=index_hadoop, dataSource=pageviews}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0.jar:0.9.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.0.jar:0.9.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0.jar:0.9.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_101]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_101]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_101]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_101]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]

at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0]

… 7 more

Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.

at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65) ~[?:?]

at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193) ~[?:?]

at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178) ~[?:?]

at org.apache.hadoop.io.compress.CompressionCodec$Util.createInputStreamWithCodecPool(CompressionCodec.java:157) ~[?:?]

at org.apache.hadoop.io.compress.SnappyCodec.createInputStream(SnappyCodec.java:163) ~[?:?]

at io.druid.indexer.Utils.openInputStream(Utils.java:101) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.Utils.openInputStream(Utils.java:77) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:161) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:323) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.0.jar:0.9.0]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]

at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0]

… 7 more

2016-05-20T20:35:27,763 INFO io.druid.indexing.worker.executor.ExecutorLifecycle: Task completed with status: {

“id” : “index_hadoop_pageviews_2016-05-20T20:33:38.361Z”,

“status” : “FAILED”,

“duration” : 82683

}

I’ve been trying variations to keep it from failing, but nothing yet has worked. I would think that setting -Djava.library.path=/usr/lib/hadoop/lib/native on druid.indexer.runner.javaOpts would help, but it doesn’t. Is there a way I can pass LD_LIBRARY_PATH down to the Peon’s environment? Has anyone else run into this?

Thanks!

-Andrew

​Quick update this morning. I’m pretty sure setting just LD_LIBRARY_PATH
in the middleManager’s env does propagate down to the Peon. In logs I see:

2016-05-23T15:35:07,702 INFO io.druid.cli.CliPeon: * java.library.path:
/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

Ok, I’m getting close to stumped. As far as I can tell, both Hadoop and Snappy native libs are loaded properly when I set LD_LIBRARY_PATH. LD_LIBRARY_PATH is prepended to java.library.path.

I prepped some code to help me make sure I wasn’t doing something dumb:

https://gist.github.com/ottomata/6caf158d3b787a1c3439d936a1e28916#file-snappynativetest-java

I am able to load native hadoop and snappy using the same classpath and java.library.path that druid uses.

At the bottom of this email is a bit more middleManager logging detail that leads up to this error. In summary I see:

  • middleManager starts, uses /usr/lib/hadoop/lib/native (zookeeper too?)

  • Peon indexing job starts, uses /usr/lib/hadoop/lib/native (zookeeper too?), but prints out ‘Unable to load native-hadoop library for your platform… using builtin-java classes where applicable’

  • YARN Hadoop indexing job is submitted and completes. I believe this writes a .snappy file somewhere into hdfs:///tmp/hadoop-indexing/…

  • middleManager (or Peon task?) attempts to read previously written snappy file. Errors out with ‘native snappy library not available: this version of libhadoop was built without snappy support’.

So ja, something is fishy with the Peon’s java.library.path. Even though the java.library.path is clearly set properly when the Peon starts up, it does not register the shared library files, as indicated by the ‘Unable to load native-hadoop library…’ message.

I guess if I don’t hear from someone by tomorrow, I’ll file an issue on Github.

Actual logs below. I’ve removed stuff that looked uninteresting. I see classpaths, extensions, and hadoop-dependencies all loading as expected.

2016-05-23T19:18:31,500 INFO io.druid.cli.CliMiddleManager: * java.library.path:/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

2016-05-23T19:18:32,700 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

2016-05-23T19:18:35,840 INFO org.eclipse.jetty.server.ServerConnector: Started ServerConnector@6685f71a{HTTP/1.1}{0.0.0.0:8091}

2016-05-23T19:18:35,844 INFO org.eclipse.jetty.server.Server: Started @25796ms

2016-05-23T19:19:40,744 INFO io.druid.cli.CliPeon: * java.library.path:/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

2016-05-23T19:19:42,894 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

2016-05-23T19:19:43,745 INFO io.druid.indexing.worker.executor.ExecutorLifecycle: Running with task: {

“type” : “index_hadoop”,

“id” : “index_hadoop_pageviews_2016-05-23T19:19:22.575Z”,

2016-05-23T19:19:48,880 INFO org.eclipse.jetty.server.ServerConnector: Started ServerConnector@1371e566{HTTP/1.1}{0.0.0.0:8100}

2016-05-23T19:19:48,881 INFO org.eclipse.jetty.server.Server: Started @25487ms

2016-05-23T19:20:20,066 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1463163743644_0030

2016-05-23T19:21:17,670 INFO io.druid.indexer.DetermineHashedPartitionsJob: Job completed, loading up partitions for intervals[Optional.of([2015-09-01T00:00:00.000Z/2015-09-02T00:00:00.000Z])].

2016-05-23T19:21:17,959 ERROR io.druid.indexing.overlord.ThreadPoolTaskRunner: Exception while running task[HadoopIndexTask{id=index_hadoop_pageviews_2016-05-23T19:19:22.575Z, type=index_hadoop, dataSource=pageviews}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0.jar:0.9.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.0.jar:0.9.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]

Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.

at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65) ~[?:?]

at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193) ~[?:?]

at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178) ~[?:?]

at org.apache.hadoop.io.compress.CompressionCodec$Util.createInputStreamWithCodecPool(CompressionCodec.java:157) ~[?:?]

at org.apache.hadoop.io.compress.SnappyCodec.createInputStream(SnappyCodec.java:163) ~[?:?]

at io.druid.indexer.Utils.openInputStream(Utils.java:101) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.Utils.openInputStream(Utils.java:77) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

2016-05-23T19:21:18,084 INFO io.druid.indexing.worker.executor.ExecutorLifecycle: Task completed with status: {

“id” : “index_hadoop_pageviews_2016-05-23T19:19:22.575Z”,

“status” : “FAILED”,

“duration” : 93849

}

I gave up on snappy, and decided to try to force the Druid indexer jobs to output gzip files instead. I tried this to do this in 3 different ways. In all cases I modified properties for both the middleManager and Peon (via druid.indexer.runner.javaOpts) JVMs.

  • -Dmapreduce.output.fileoutputformat.compress=org.apache.hadoop.io.compress.GzipCodec

  • -Dmapred.child.java.opts=-Dmapreduce.output.fileoutputformat.compress=org.apache.hadoop.io.compress.GzipCodec

  • -Dmapreduce.map.java.opts=-Dmapreduce.output.fileoutputformat.compress=org.apache.hadoop.io.compress.GzipCodec -Dmapreduce.reduce.java.opts=-Dmapreduce.output.fileoutputformat.compress=org.apache.hadoop.io.compress.GzipCodec

I examined the job properties for the YARN job launched by the indexing task. None of these settings were passed down to the job. The SnappyCodec configured in mapred-site.xml was used.

​Ah, but of course this won’t work. These are JVM options on the
middleManager and Peon, and they won’t pass these down to the MapReduce job
automatically.

Is there a way to provide Hadoop related settings to the Peon before it
submits the MapReduce indexing job?​

AH I finally was able to run an indexing job! The answer to my previous question is

  "jobProperties" : {"mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec”}

In the indexing task specification. Yay!

I still think native libs should work. This is a bug. I will file one. :slight_smile:

This is a fun one. Looking at SnappyCodec it seems thatit does some basic checking as per:

if (!NativeCodeLoader.buildSupportsSnappy()) {

  throw new RuntimeException("native snappy library not available: " +
      "this version of libhadoop was built without " +
      "snappy support.");
}

Now, the FUN thing to know at this point is that you are in a special classloader. (as per io.druid.indexing.common.task.HadoopTask.invokeForeignLoader)

This means that all the classes and jars need to be within the hadoop directory found by the coordinates specified as per http://druid.io/docs/0.9.0/operations/other-hadoop.html

now, what I DON’T know, is how well native libraries play with isolated classloaders. So look at the directory where your isolated hadoop stuff is located and make sure the correct jars are there.

To kind of complete this, the thing hadoop is doing in the boolean check is pretty simple:

JNIEXPORT jboolean JNICALL Java_org_apache_hadoop_util_NativeCodeLoader_buildSupportsSnappy
  (JNIEnv *env, jclass clazz)
{
#ifdef HADOOP_SNAPPY_LIBRARY
  return JNI_TRUE;
#else
  return JNI_FALSE;
#endif
}

So as long as the hadoop-common jar is the one you intend it to be (with snappy support) then it shoooooouuuulllllddddd be ok

So as long as the hadoop-common jar is the one you intend it to be (with

snappy support) then it shoooooouuuulllllddddd be ok

​Hm, it should be!


uses
the same jars that I have in hadoop-dependencies and loaded via
hadoopDependencyCoordinates, and it all works fine from there.

I just created an issue for this: https://github.com/druid-io/druid/issues/3025

Hi,

I am also facing similar kind of issue.

But task is failing randomly because of lz4 lib not available.

Error: java.lang.RuntimeException: native lz4 library not available

But not able to understand why the task is failing randomly.