ORC - Hadoop ingestion - java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.DiskRangeList

Hi everyone. I have an issue when ingesting data in ORC files with Hadoop.
I get this exception:

Error: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.DiskRangeList
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
	at org.apache.orc.OrcFile.createReader(OrcFile.java:385)
	at org.apache.orc.mapreduce.OrcInputFormat.createRecordReader(OrcInputFormat.java:68)
	at org.apache.hadoop.mapreduce.lib.input.DelegatingRecordReader.<init>(DelegatingRecordReader.java:57)
	at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.createRecordReader(DelegatingInputFormat.java:129)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

My ingestion spec is:

{
    "type": "index_hadoop",
    "spec": {
      "ioConfig": {
        "type": "hadoop",
        "inputSpec": {
          "type": "static",
          "inputFormat": "org.apache.orc.mapreduce.OrcInputFormat",
          "paths": "/path/to/data/*",
          "flattenSpec": {
            "fields": [
              ...
            ]
          }
        }
        
      },
      "dataSchema": {
        "dataSource": "view_element_daily_uniq_hll",
        "parser": {
          "type": "orc",
          "parseSpec": {
            "format": "timeAndDims",
            "timestampSpec": {
                "column": "_col4",
                "format": "posix"
              },
              "dimensionsSpec": {
                "dimensions": [
                  ...
                ]
              }
          }
        },
        "metricsSpec": [
            {
              "name": "count",
              "type": "count"
            },
            {
              "type": "HLLSketchBuild",
              "lgK": 8,
              "tgtHllType": "HLL_8",
              "name": "hll_state",
              "fieldName": "_col7"
            }
          ],
        "granularitySpec": {
            "queryGranularity": "day",
            "rollup": true,
            "segmentGranularity": "day"
        }
      },
      "tuningConfig" : {
        "type" : "hadoop",
        "partitionsSpec" : {
          "type" : "hashed",
          "targetPartitionSize" : 5000000
        },
        "forceExtendableShardSpecs" : true,
        "jobProperties" : {
          "dfs.client.use.datanode.hostname" : "true",
          "dfs.datanode.use.datanode.hostname" : "true",
          "mapreduce.map.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
          "mapreduce.job.user.classpath.first" : "true",
          "mapreduce.reduce.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
          "mapreduce.map.memory.mb" : 1024,
          "mapreduce.reduce.memory.mb" : 1024,
          "mapreduce.job.classloader": "true",
          "mapreduce.job.queuename": "queue"
        }
      }
    }
  }

I’m running hadoop 2.7.1 on a cluster and Druid 0.22.1 on 4 machines with respectively 1 Overlord 1 Broker and 1 Data Server and 1 Data Server.

Notes:

  • Hadoop ingestion works with this config and Json files
  • I’m using Hadoop Client 2.8.5 to run the map reduce job, could this version mismatch be the problem?

Thank you for helping :slight_smile:

fixed by adding

"mapreduce.job.classloader.system.classes": "java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., -org.apache.hadoop.hive., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml",

to "jobProperties"

3 Likes