Batch reindexing failing with NoSuchMethodError when batch has more data

hi,

We have been trying to reindex some of the old data. It’s running fine for the batches with 200K records. But batches with 2M records are failing for index-generator MR job with below exception.

2017-03-28 19:41:32,352 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.google.common.io.Files.asByteSink(Ljava/io/File;[Lcom/google/common/io/FileWriteMode;)Lcom/google/common/io/ByteSink;

at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:801)

at io.druid.segment.IndexMerger.merge(IndexMerger.java:438)

at io.druid.segment.IndexMerger.persist(IndexMerger.java:186)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.persist(IndexGeneratorJob.java:510)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:688)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:478)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Unknown Source)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

I thought it was a class-path issue but the same config is working fine for smaller batches. Below is the config we are using.

“tuningConfig” : {

“type” : “hadoop”,

“partitionsSpec”: {

“type”: “hashed”,

“targetPartitionSize”: 5000000

},

“maxRowsInMemory” : 500000,

“jobProperties” : {

“mapreduce.job.user.classpath.first” : “true”,

“mapreduce.job.classloader”: “true”,

“mapreduce.map.java.opts” : “-Xmx2048m -Duser.timezone=UTC”,

“mapreduce.reduce.java.opts” : “-Xmx2048m -Duser.timezone=UTC”,

“mapreduce.map.memory.mb” : “8192”,

“mapreduce.reduce.memory.mb” : “8192”

}

}

I have tried reducing targetPartitionSize and maxRowsInMemory but no use. Do we have any limit on number of rows that hadoop Indexer can reindex ? Currently it’s not working for a day for us.

Thanks !

Siva

Hi Siva,
there is no such limit. Looks like a class path issue.

I know this is an old post, but for someone landed from Google search…

I had the same issue, and noticed that I had multiple Druid versions at hdfs:///tmp/druid-indexing

This directory is to distribute the Yarn application to each node. The batch job ran fine after removing the directory (at least the No Such Method error is resolved).

hdfs dfs -rm -r /tmp/druid-indexing

-kenji