Out of range for multi-value dimension - indexing issue?

Hello Druid community,

I am getting the error below when trying to ingest a multi-value dimension with a large amount of numeric values using the Hadoop indexer.

Can anyone provide insight as to why this is occurring and how we can fix it?

thanks.

Marc

2018-10-10T20:10:31,855 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1537474809758_0047_r_000047_3, Status : FAILED

Error: java.lang.IllegalArgumentException: Out of range: 2147822832

at com.google.common.primitives.Ints.checkedCast(Ints.java:91)

at io.druid.segment.data.GenericIndexedWriter.write(GenericIndexedWriter.java:225)

at io.druid.segment.StringDimensionMergerV9.mergeBitmaps(StringDimensionMergerV9.java:376)

at io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:305)

at io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:219)

at io.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:837)

at io.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:710)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.mergeQueryableIndex(IndexGeneratorJob.java:541)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:717)

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:500)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

Hi Marc,

This means the ingestion task is trying to create a segment with size > the 2GB limit, you would need to partition the data into smaller output segments.

Thanks,

Jon

Thanks Jonathan! I appreciate the response.

Is this controlled by targetPartitionSize in partitionsSpec?

Marc

Yes, setting targetPartitionSize will affect the output segment sizes.

Thanks,

Jon

Thanks again Jon!