Hadoop Batch Indexing Fails with Null Pointer Exception

Hi,

We just upgraded to 0.10.0 yesterday and have started seeing failures in our Hadoop batch ingestion jobs.

Here is the stack trace for the exception.

2017-08-09T20:10:31,370 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 0%
2017-08-09T20:11:38,986 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 68%
2017-08-09T20:11:39,998 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1502288600950_9776_r_000000_0, Status : FAILED
Error: java.lang.NullPointerException
	at io.druid.segment.indexing.granularity.UniformGranularitySpec.bucketInterval(UniformGranularitySpec.java:100)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:325)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:299)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.run(DetermineHashedPartitionsJob.java:356)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

2017-08-09T20:11:41,029 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 0%
2017-08-09T20:12:30,495 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1502288600950_9776_r_000000_1, Status : FAILED
Error: java.lang.NullPointerException
	at io.druid.segment.indexing.granularity.UniformGranularitySpec.bucketInterval(UniformGranularitySpec.java:100)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:325)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:299)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.run(DetermineHashedPartitionsJob.java:356)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2017-08-09T20:13:41,067 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 68%
2017-08-09T20:13:47,079 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 69%
2017-08-09T20:13:49,087 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1502288600950_9776_r_000000_2, Status : FAILED
Error: java.lang.NullPointerException
	at io.druid.segment.indexing.granularity.UniformGranularitySpec.bucketInterval(UniformGranularitySpec.java:100)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:325)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.reduce(DetermineHashedPartitionsJob.java:299)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityReducer.run(DetermineHashedPartitionsJob.java:356)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Here is the ingestion spec used for this job

{
  "type" : "index_hadoop",
  "id" : "index_hadoop_digital.small_ts_txt_2017-08-09T20:07:31.968Z",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "digital.small_ts_txt",
      "parser" : {
        "parseSpec" : {
          "timestampSpec" : {
            "column" : "servertimestamp",
            "format" : "yyyy-MM-dd' 'HH:mm:ss"
          },
          "columns" : [ "servertimestamp", "app_id", "eventtype", "visitorid" ],
          "dimensionsSpec" : {
            "dimensions" : [ "app_id", "eventtype", "visitorid" ]
          },
          "format" : "csv"
        },
        "type" : "string"
      },
      "metricsSpec" : [ {
        "type" : "count",
        "name" : "count"
      } ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "HOUR",
        "queryGranularity" : {
          "type" : "none"
        },
        "rollup" : true,
        "intervals" : null
      }
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "paths" : "/apps/hive/warehouse/z0020kv.db/small_ts_txt/",
        "type" : "static"
      },
      "metadataUpdateSpec" : null,
      "segmentOutputPath" : null
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "workingPath" : null,
      "version" : "2017-08-09T20:07:31.968Z",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000,
        "maxPartitionSize" : 6000000,
        "assumeGrouped" : false,
        "numShards" : -1,
        "partitionDimensions" : [ ]
      },
      "shardSpecs" : { },
      "indexSpec" : {
        "bitmap" : {
          "type" : "concise"
        },
        "dimensionCompression" : "lz4",
        "metricCompression" : "lz4",
        "longEncoding" : "longs"
      },
      "maxRowsInMemory" : 50000,
      "leaveIntermediate" : false,
      "cleanupOnFailure" : true,
      "overwriteFiles" : false,
      "ignoreInvalidRows" : false,
      "jobProperties" : {
        "mapreduce.reduce.memory.mb" : "6656",
        "mapreduce.map.memory.mb" : "4096",
        "mapreduce.job.user.classpath.first" : "true",
        "mapreduce.job.queuename" : "SVGRNPAD",
        "mapreduce.job.jvm.numtasks" : "20",
        "mapreduce.map.output.compress" : "false",
        "mapreduce.reduce.java.opts" : "-Xmx5300m -Duser.timezone=UTC -Dfile.encoding=UTF-8",
        "mapreduce.map.java.opts" : "-Xmx3800m -Duser.timezone=UTC -Dfile.encoding=UTF-8",
        "mapreduce.input.fileinputformat.split.maxsize" : "67108864"
      },
      "combineText" : true,
      "useCombiner" : true,
      "buildV9Directly" : true,
      "numBackgroundPersistThreads" : 1,
      "forceExtendableShardSpecs" : false,
      "useExplicitVersion" : false
    },
    "uniqueId" : "a51097b7d1f044fc952a043ad58a98b9"
  },
  "hadoopDependencyCoordinates" : [ "org.apache.hadoop:hadoop-client:2.5.3.0-37" ],
  "classpathPrefix" : null,
  "context" : null,
  "groupId" : "index_hadoop_digital.small_ts_txt_2017-08-09T20:07:31.968Z",
  "dataSource" : "digital.small_ts_txt",
  "resource" : {
    "availabilityGroup" : "index_hadoop_digital.small_ts_txt_2017-08-09T20:07:31.968Z",
    "requiredCapacity" : 1
  }
}

Would appreciate any insights into this issue.

I work with Gurdeep,

We’ve noticed this issue goes away when we specify the intervals option in the granularitySpec. However, we would prefer to leverage the functionality that worked before when we omitted the intervals option, the index job would run a full data scan to determine the intervals present.

I also noticed even though we omit the intervals attribute, it is showing up in the log of the job run by the overlord as present but null. So it seems like something in the logic that determines if intervals has been provided may have a bug in this release? Or was this functionality deliberately discontinued?

See this post by Fangjin Yang describing how this is supposed to work… It seems like the determine intervals is being skipped when we want it to run.

https://groups.google.com/d/msg/druid-user/fN-AP-dEVQo/ImXQjOI_AQAJ

This was a bug introduced in 0.10.0 that I expect will be fixed in the next release. Sorry for the inconvenience!

The github issues for reference: https://github.com/druid-io/druid/issues/4647, https://github.com/druid-io/druid/pull/4686.

@Gian, this is a blocker right now in our development. Can you think of any tricks to get this to work until the next release and do you have any idea when the next release will be available with the fix? We need to decide if we have to invest time in a work around or if we should wait or revert back to the prior version.

Hey Jason,

You could either apply the patch in https://github.com/druid-io/druid/pull/4686 to your version of Druid directly (if it’s 0.10.1, that’s the “druid-0.10.1” tag in github), or wait for the next release with this fix (should be 0.11.0), or roll back to 0.9.2. If I was you I would probably try to apply the patch and see if that helps – as an added bonus it would help the community if you can confirm the patch works for you.