hadoop batch ingestion failed.

Hi,

I am trying to switch batch ingestion from local to hadoop but I cann’t get it to run. The log says " Job[class io.druid.indexer.LegacyIndexGeneratorJob] failed!" as follows but I don’t have a clue of what it means or how to fix it. Can someone take a look at this? Thanks!

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

2015-06-24T23:32:56,599 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%

2015-06-24T23:32:57,611 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_2036 failed with state FAILED due to: Task failed task_1432969244945_2036_m_000000

Job failed as tasks failed. failedMaps:1 failedReduces:0

2015-06-24T23:32:57,691 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 12

    Job Counters

            Failed map tasks=4

            Launched map tasks=4

            Other local map tasks=3

            Rack-local map tasks=1

            Total time spent by all maps in occupied slots (ms)=12143

            Total time spent by all reduces in occupied slots (ms)=0

            Total time spent by all map tasks (ms)=12143

            Total vcore-seconds taken by all map tasks=12143

            Total megabyte-seconds taken by all map tasks=49737728

    Map-Reduce Framework

            CPU time spent (ms)=0

            Physical memory (bytes) snapshot=0

            Virtual memory (bytes) snapshot=0

2015-06-24T23:32:57,699 INFO [task-runner-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/hadoop_test/2015-06-24T233208.070Z]

2015-06-24T23:32:57,717 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_hadoop_test_2015-06-24T23:32:08.069Z, type=index_hadoop, dataSource=hadoop_test}]

java.lang.reflect.InvocationTargetException

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_25]

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_25]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_25]

    at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]

    at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:256) ~[druid-indexing-service-0.7.3.jar:0.7.3]

    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.7.3.jar:0.7.3]

    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.7.3.jar:0.7.3]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_25]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_25]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_25]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25]

Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.LegacyIndexGeneratorJob] failed!

    at io.druid.indexer.JobHelper.runJobs(JobHelper.java:155) ~[druid-indexing-hadoop-0.7.3.jar:0.7.3]

    at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96) ~[druid-indexing-hadoop-0.7.3.jar:0.7.3]

    at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:304) ~[druid-indexing-service-0.7.3.jar:0.7.3]

    ... 11 more

2015-06-24T23:32:57,725 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_hadoop_hadoop_test_2015-06-24T23:32:08.069Z”,

“status” : “FAILED”,

“duration” : 44106

}

I have run into this issue before. It is caused because Hadoop and Druid rely on conflicting versions of fasterxml. The solution for me was to build a custom jar to use, which manually excludes the conflicting fasterxml dependency, and then to use this jar on the classpath of my hadoop indexing task.
See Benjamin Schaff’s comment in this thread: https://groups.google.com/forum/#!msg/druid-development/jNxhMZpp-rc/XwAFP2xYe60J
Here is the build file I used:
libraryDependencies ++= Seq(
“com.amazonaws” % “aws-java-sdk” % “1.9.23” exclude(“common-logging”, “common-logging”),
“org.joda” % “joda-convert” % “1.7”,
“joda-time” % “joda-time” % “2.7”,
“io.druid” % “druid” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid” % “druid-services” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid” % “druid-indexing-service” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid” % “druid-indexing-hadoop” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid.extensions” % “mysql-metadata-storage” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid.extensions” % “druid-s3-extensions” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“io.druid.extensions” % “druid-histogram” % “0.7.1.1” excludeAll (
ExclusionRule(“org.ow2.asm”),
ExclusionRule(“com.fasterxml.jackson.core”),
ExclusionRule(“com.fasterxml.jackson.datatype”),
ExclusionRule(“com.fasterxml.jackson.dataformat”),
ExclusionRule(“com.fasterxml.jackson.jaxrs”),
ExclusionRule(“com.fasterxml.jackson.module”)
),
“com.fasterxml.jackson.core” % “jackson-annotations” % “2.3.0”,
“com.fasterxml.jackson.core” % “jackson-core” % “2.3.0”,
“com.fasterxml.jackson.core” % “jackson-databind” % “2.3.0”,
“com.fasterxml.jackson.datatype” % “jackson-datatype-guava” % “2.3.0”,
“com.fasterxml.jackson.datatype” % “jackson-datatype-joda” % “2.3.0”,
“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-base” % “2.3.0”,
“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-json-provider” % “2.3.0”,
“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-smile-provider” % “2.3.0”,
“com.fasterxml.jackson.module” % “jackson-module-jaxb-annotations” % “2.3.0”,
“com.sun.jersey” % “jersey-servlet” % “1.17.1”,
“mysql” % “mysql-connector-java” % “5.1.34”,
“org.scalatest” %% “scalatest” % “2.2.3” % “test”,
“org.mockito” % “mockito-core” % “1.10.19” % “test”
)
assemblyMergeStrategy in assembly := {
case path if path contains “pom.” => MergeStrategy.first
case path if path contains “javax.inject.Named” => MergeStrategy.first
case path if path contains “mime.types” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/impl/SimpleLog.class” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/impl/SimpleLog$1.class” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/impl/NoOpLog.class” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/LogFactory.class” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/LogConfigurationException.class” => MergeStrategy.first
case path if path contains “org/apache/commons/logging/Log.class” => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
It would be nice not to need to package your own build, but for now this seems to be the work-around.

Hi Qi,

Just wondering, did you end up getting this to work? And either way- I’m wondering which version of Hadoop are you using? I think Michael is right that it’s a dependency/packaging problem, and it would be good for us to know which Hadoop versions are causing problems with the current Druid builds.

Hi Gian,

I’m still trying to figure out how to recompile Druid since I used the tar ball before. The hadoop version I’m using is Hadoop 2.5.0-cdh5.3.3. I saw the new version of Druid, 0.8.0 is released recently and I’m wondering if that version is compatible with our hadoop system?

Thanks,

Qi

Hi Michael,

Thanks for the help! I’m trying to build the custom jar as you said using sbt but I’m getting this error, have you met it before?

Thanks!

assemblyMergeStrategy in assembly := {

^

[error] Type error in expression

Qi,
I have not seen this issue before. what version of SBT are you using? I did this with 0.13.8

Gian,
I ran into this issue with druid 0.7.1.1 (depends on jackson 2.4.0) and hadoop 2.4 (depends on jackson 2.3.0)

Hi Michael,

I’m doing it with 0.13.8 as well. I’m a newbie to sbt. This is what I did:

a) Install sbt

b) Download & unpack the source code of Druid (version 0.7.3)

c) Create a build.sbt file in the base directory of druid source code with the content you provided. And then modify the version number from “0.7.1.1” to “0.7.3”

d) Create a assemble.sbt file in the same directory and put “addSbtPlugin(“com.eed3si9n” % “sbt-assembly” % “0.12.0”)” in it.

e) Run sbt

Does it look good?

Thanks,

Qi

I did things slightly differently:

  1. install sbt
  2. create new empty directory ‘druid_assembly_build’
  3. cd to this new directory
  4. create a build.sbt file with the contents described above
  5. create a directory ‘druid_assembly_build/project’
  6. create a file ‘druid_assembly_build/project/assembly.sbt’ with contents “addSbtPlugin(“com.eed3si9n” % “sbt-assembly” % “0.13.0”)”
  7. from inside druid_assembly_build run ‘sbt assembly’

It worked. Finally the map phase stop complaining. Thanks Michael!
But now the reduce phase starts to have the same issue. I have no clue why this happened. It seems Benjamin Schaff’s thread mentioned the same issue but they didn’t really mention how to solve it. So I’m wondering if you met this also.

2015-06-30T21:23:29,085 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2041_r_000006_0, Status : FAILED

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

So the error message is the same, the only difference is that for now map phase can finish smoothly but reduce phase has the same issue. I’m using Druid 0.7.3, maybe there are other dependencies I need to manually resolve?

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

2015-06-30T23:57:02,959 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 75%

2015-06-30T23:57:03,966 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2044_r_000020_2, Status : FAILED

Error: java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-21T00:00:00.000Z_2015-05-22T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)

at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)

at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:470)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:446)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:292)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-06-30T23:57:03,967 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2044_r_000021_2, Status : FAILED

Error: java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-22T00:00:00.000Z_2015-05-23T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)

at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)

at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:470)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:446)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:292)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-06-30T23:57:04,971 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 83%

2015-06-30T23:57:09,992 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%

2015-06-30T23:57:09,999 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_2044 failed with state FAILED due to: Task failed task_1432969244945_2044_r_000019

Job failed as tasks failed. failedMaps:0 failedReduces:1

Tried Druid 0.7.1.1. Got the same error. The interesting thing is that the segments actually was stored to deep storage folder but the mysql meta data for those segments is not updated.

Hi Qi, are you actually getting both of these errors?

  1. class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

  2. java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-22T00:00:00.000Z_2015-05-23T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///

It’s strange that you’d get the first one on some machines but not others. If you’ve rebuilt specific Druid versions, you could try wiping out the Druid jars on HDFS to make sure that your new ones are actually getting uploaded. I think they’re in /tmp/druid-indexing/classpath by default.

For the second one, that seems like something wrong with your Druid hadoop indexing json spec or with your hadoop config xmls. Do you mind posting those?

Hi Gian,

I managed to solve the first problem. It turns out that I didn’t delete the old library in /tmp/druid-indexing/classpath/ and I forgot to remove the lib/* from the classpath. Thanks and you are awesome!

For the second problem. My hadoop ingestion file looks like this. It’s just a small test file.

{

“type” : “index_hadoop”,

“spec” : {

"dataSchema" : {

  "dataSource" : "hadoop_test",

  "parser" : {

    "type" : "string",

    "parseSpec" : {

      "format" : "json",

      "timestampSpec" : {

        "column" : "ds",

        "format" : "auto"

      },

      "dimensionsSpec" : {

        "dimensions": [

            "dim_app_family",

            "dim_browser_family",

            "dim_destination_country",

            "dim_destination_market",

            "dim_device_type_best_guess",

            "dim_language",

            "dim_origin_country",

            "dim_origin_market",

            "dim_os_family",

            "ds",

            "subject_id",

            "treatment_name"

        ],

        "dimensionExclusions" : [],

        "spatialDimensions" : []

      }

    }

  },

  "metricsSpec" : [

    {

      "type" : "count",

      "name" : "count"

    }

  ],

  "granularitySpec" : {

    "type" : "uniform",

    "segmentGranularity" : "DAY",

    "queryGranularity" : "NONE",

    "intervals" : [ "2015-05-01/2015-05-25" ]

  }

},

"ioConfig" : {

  "type" : "hadoop",

  "inputSpec" : {

    "type" : "static",

    "paths" : "/user/qi_wang/hadoop_data.json"

  }

},

"tuningConfig" : {

  "type": "hadoop"

}

}

}

Hi Gian,

I managed to solve the 2nd problem as well! The build.sbt michael provided does not include the hdfs extension. Here is the new build.sbt file I used just in case someone else need it in the future.

libraryDependencies ++= Seq(

“com.amazonaws” % “aws-java-sdk” % “1.9.23” exclude(“common-logging”, “common-logging”),

“org.joda” % “joda-convert” % “1.7”,

“joda-time” % “joda-time” % “2.7”,

“io.druid” % “druid” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid” % “druid-services” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid” % “druid-indexing-service” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid” % “druid-indexing-hadoop” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid.extensions” % “mysql-metadata-storage” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid.extensions” % “druid-s3-extensions” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid.extensions” % “druid-histogram” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“io.druid.extensions” % “druid-hdfs-storage” % “0.7.3” excludeAll (

ExclusionRule(“org.ow2.asm”),

ExclusionRule(“com.fasterxml.jackson.core”),

ExclusionRule(“com.fasterxml.jackson.datatype”),

ExclusionRule(“com.fasterxml.jackson.dataformat”),

ExclusionRule(“com.fasterxml.jackson.jaxrs”),

ExclusionRule(“com.fasterxml.jackson.module”)

),

“com.fasterxml.jackson.core” % “jackson-annotations” % “2.3.0”,

“com.fasterxml.jackson.core” % “jackson-core” % “2.3.0”,

“com.fasterxml.jackson.core” % “jackson-databind” % “2.3.0”,

“com.fasterxml.jackson.datatype” % “jackson-datatype-guava” % “2.3.0”,

“com.fasterxml.jackson.datatype” % “jackson-datatype-joda” % “2.3.0”,

“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-base” % “2.3.0”,

“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-json-provider” % “2.3.0”,

“com.fasterxml.jackson.jaxrs” % “jackson-jaxrs-smile-provider” % “2.3.0”,

“com.fasterxml.jackson.module” % “jackson-module-jaxb-annotations” % “2.3.0”,

“com.sun.jersey” % “jersey-servlet” % “1.17.1”,

“mysql” % “mysql-connector-java” % “5.1.34”,

“org.scalatest” %% “scalatest” % “2.2.3” % “test”,

“org.mockito” % “mockito-core” % “1.10.19” % “test”

)

assemblyMergeStrategy in assembly := {

case path if path contains “pom.” => MergeStrategy.first

case path if path contains “javax.inject.Named” => MergeStrategy.first

case path if path contains “mime.types” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/impl/SimpleLog.class” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/impl/SimpleLog$1.class” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/impl/NoOpLog.class” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/LogFactory.class” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/LogConfigurationException.class” => MergeStrategy.first

case path if path contains “org/apache/commons/logging/Log.class” => MergeStrategy.first

case path if path contains “META-INF/jersey-module-version” => MergeStrategy.first

case path if path contains “.properties” => MergeStrategy.first

case path if path contains “.class” => MergeStrategy.first

case x =>

val oldStrategy = (assemblyMergeStrategy in assembly).value

oldStrategy(x)

}

Hi Qi, would you like to contribute your findings to the Druid documentation? It should help others who face the same problems.

Yeah sure. How do I do that?

All of the Druid documentation is hosted in the Druid github repository.

This is a good documentation to add your findings to:

https://github.com/druid-io/druid/blob/master/docs/content/operations/other-hadoop.md

Hi Qi,
I tried building a stand alone assembly using the build.sbt file that you recommended. I was able to successfully build the the jar, however, I have been unable to run it successfully.

Can you share with me the java command line that you use to run the index job?

I’m finding that when I include in common.runtime.properties

Extensions

druid.extensions.coordinates=[“io.druid.extensions:mysql-metadata-storage”,“io.druid.extensions:druid-hdfs-storage”,“io.druid.extensions:druid-indexing-hadoop”]

The druid extensions then causes this failure:

2015-07-07T17:18:59,273 ERROR [main] io.druid.initialization.Initialization - Unable to resolve artifacts for [io.druid.extensions:druid-indexing-hadoop:jar:0.7.1.1 (runtime) -> < [ (https://repo1.maven.org/maven2/, releases+snapshots), (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local, releases+snapshots)]].

org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact io.druid.extensions:druid-indexing-hadoop:jar:0.7.1.1 in (https://repo1.maven.org/maven2/)

When I remove the druid extensions, it gives this error:

2015-07-07T17:32:09,063 INFO [main] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.columnCache.sizeBytes] on [io.druid.query.DruidProcessingConfig#columnCacheSizeBytes()]

2015-07-07T17:32:09,064 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServiceConfig#getFormatString()]

2015-07-07T17:32:09,130 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[interface io.druid.segment.data.BitmapSerdeFactory] from props[druid.processing.bitmap.] as [ConciseBitmapSerdeFactory{}]

2015-07-07T17:32:09,232 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_79]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_79]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_79]

at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_79]

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:120) [DruidAssembly-SBT-assembly-1.0.jar:1.0]

at io.druid.cli.Main.main(Main.java:88) [DruidAssembly-SBT-assembly-1.0.jar:1.0]

Caused by: com.google.inject.CreationException: Guice creation errors:

  1. Binding to null instances is not allowed. Use toProvider(Providers.of(null)) if this is your intended behaviour.

at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:83)

I’m totally confused as to how you got this Hadoop Indexer to finally work.

Any help would be greatly appreciated.

Johnny Hom

n Thursday, June 25, 2015 at 2:16:52 PM UTC-7, Qi Wang wrote:

OK! Will do that later!

Hi Johnny,

Try this:

  1. In the runtime configuration file of overlord, you need to remove all the extensions there because you have already included them in the fat jar you compiled using sbt.

  2. In the command line, you need to remove the lib/* and then include the path to your fat jar.

Hope that helps.

Qi