Druid 0.10.0 - hadoop indexing fails when intervals not specified in granularitySpec

With 0.9.x when intervals was left out of the granularitySpec in the ingestion specification, the indexing process would determine the proper intervals from the data.

After the upgrade to 0.10.0, ingestion specs without intervals specified in the granularitySpec fail with a NPE.

Error: java.lang.NullPointerException

at io.druid.segment.indexing.granularity.UniformGranularitySpec.bucketInterval(UniformGranularitySpec.java:100)

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:325)

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:299)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.run(DetermineHashedPartitionsJob.java:356)

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Any suggestions?

This feature is useful as we are generating the ingestion spec and would need to inspect the data to determine intervals ourselves anyway.

One workaround is to provide a wide interval, but doing so will cause the creation of MANY un-necessary reducers.

Thank you.

Paul

Hey Paul,

This was a bug in 0.10.0 (and 0.10.1); see these github issues: https://github.com/druid-io/druid/issues/4647, https://github.com/druid-io/druid/pull/4686. I expect it will be fixed in the next release.