Error with Daily Granularity

Hi all,

I am getting the following error when ingesting CSVs with DAY segment granularity but not with HOUR segment granularity.

My CSVs look like the following (with my timestamp column in bold):

2015-01-01T00:00:00.000Z,US:MD,34273,0,3,186411,…

Here’s my granularity spec from my hadoop batch ingestion spec:

“granularitySpec”: {

“type”:“uniform”,

“segmentGranularity”:“DAY”,

“queryGranularity”: {

“type”:“duration”,

“duration”:86400000,

“origin”:“1970-01-01T00:00:00.000Z”

},

“rollup”:true,

“intervals”:[“2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z”]

}

and here’s the error:

2017-06-06 15:20:41,977 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : com.metamx.common.RE: Failure on row[2015-01-01T00:00:00.000Z,US:MD,2015-01-01,0,34273,0,3,186411,1,1,2,0,1,6,3,6,0,0,0,2,0,20032093,2842243,"en",2,93,1,0,US,MD,"Olney",1,0.000000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000010000,0.000011500,0,0.000000000,0.000010000,0.000011500,0.000014860,0.000000000,0.000000000,0.001000000,0.001014860,0.000014860,0.000049747,0.000000000,0.001000000,0.001064607,1,0,0,0,\N,0]
	at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.NullPointerException
	**at io.druid.indexer.HadoopDruidIndexerConfig.getBucket(HadoopDruidIndexerConfig.java:414)**
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:278)
	at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:87)
	... 8 more

My Druid version is 0.9.2


Solved the issue. Problem stemmed from my servers having a different timezone (EDT) than what Druid was configured for (UTC).

For anyone else having the issue, I set the timezone to EDT for the following properties:

druid.broker.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.broker.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.coordinator.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.historical.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.middlemanager.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.overlord.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.router.jvm.opts="-Duser.timezone=EDT -Dfile.encoding=UTF-8"

druid.indexer.runner.javaOpts="-server -Xmx2g -Duser.timezone=EDT -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dhdp.version={{stack_version}} -Dhadoop.mapreduce.job.classloader=true"

Note: Druid recommends using UTC.

I’d caution you against running outside of UTC – weird issues have been known to pop up in unexpected places. The period granularity features mean you can still use any time zone you want for indexing and querying data, even if the servers are running in UTC. If your Hadoop mappers and reducers are in EDT, you can get them to run in UTC too by editing mapreduce.map.java.opts and mapreduce.reduce.java.opts.

Do you have any recommendations on the following design?

  1. Will have the data come in as EDT,

2017-02-01T00:00:00-0500,US:CA,2017-02-01,0,165078,0,7,610200,0,0,21,0,2,0,0,8,0,0,0,0,0,6274847245,3767…

  1. Druid, HDFS and Server Timezones set to UTC

  2. Ingestion spec intervals in EDT

“granularitySpec”: {

“type”:“uniform”,

“segmentGranularity”:“DAY”,

“queryGranularity”: { “type”:“all” },

“rollup”:true,

“intervals”:["**2017-02-01T00:00:00-0500/**P1D"]

}