NoSuchMethodError in Hadoop

Hello,

I'm facing the following stack trace in the mapper while executing HadoopDruidIndexer task on EMR. I'm suspecting its pulling in an old version of Guava, as EMR comes bundled with v11. Any thoughts on how to fix?

2016-01-19 22:07:06,528 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.google.common.hash.Hasher.putUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/Hasher;
        at io.druid.query.aggregation.cardinality.CardinalityAggregator.hashRow(CardinalityAggregator.java:55)
        at io.druid.query.aggregation.cardinality.CardinalityAggregator.aggregate(CardinalityAggregator.java:103)
        at io.druid.indexer.InputRowSerde.toBytes(InputRowSerde.java:121)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:287)
        at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:95)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:152)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

Hey Dan,

We generally try to be careful about which Guava methods we use on code paths that are typically called during Hadoop jobs. Normally people don’t use the “cardinality” aggregator at indexing time (it’s intended for query-time counting of string columns). If you want to create a hyperloglog column at indexing time, you could use the “hyperUnique” aggregator.

Thanks Gian, I’ll give it a try.
I was under the immersion I was building up a HLL column. I think the documentation for cardinality/hyperUnique aggregators is a bit misleading.

Cardinality aggregator works on string dimensions and creates a HLL aggregator at query time, not ingest time.

Agree that docs should be better.

https://github.com/druid-io/druid/blob/master/docs/content/querying/aggregations.md

If you wanted to make a contribution to improve the docs, that would be super helpful to us.