First and Last timestamp value by group

Have a need to be able to query the first and last timestamp values from a given query.

If that query is grouping, then the first and last could be different for each group.

I thought I could use timeMin and timeMax, but have installed the extension into Druid.0.10.0 and have not had any luck.

Here’s the aggregation I tried:

{

“type”: “timeMax”,

“name”: “maxTimeStamp”,

“fieldName”: “__time”

}

Small update. timeMax / timeMin seem to do the trick on a test cluster 0.10.0, but on either of my larger clusters, I’m getting an error like:
javax.servlet.ServletException: java.lang.IllegalAccessError: tried to access field io.druid.query.aggregation.LongMaxAggregator.COMPARATOR from class io.druid.query.aggregation.TimestampAggregator

Then the Timestamp aggregator is de-registered and all subsequent queries see:
javax.servlet.ServletException: java.lang.NoClassDefFoundError: Could not initialize class io.druid.query.aggregation.TimestampAggregator

Does this mean anything to someone here?

Another bit of info. I’m seeing these in the historical nodes:

2017-09-12T19:12:22,193 ERROR [qtp1459669467-196[groupBy_[RPM.sales.rgtn]953f4b6c-7607-4706-9814-06d8bccb7ae7]] io.druid.server.QueryResource - Exception handling request: {class=io.druid.server.QueryResource, exceptionType=class java.lang.NullPointerException, exceptionMessage=null, exception=java.lang.NullPointerException, query=GroupByQuery{dataSource=‘RPM.sales.rgtn’, querySegmentSpec=MultipleSpecificSegmentSpec{descriptors=[SegmentDescriptor{interval=2017-08-03T00:00:00.000Z/2017-08-04T00:00:00.000Z, version=‘2017-08-17T13:39:11.874Z’, partitionNumber=0}, SegmentDescriptor{interval=2017-08-06T00:00:00.000Z/2017-08-07T00:00:00.000Z, version=‘2017-08-17T13:39:11.874Z’, partitionNumber=0}, SegmentDescriptor{interval=2017-08-09T00:00:00.000Z/2017-08-10T00:00:00.000Z, version=‘2017-08-17T13:39:11.874Z’, partitionNumber=4}, SegmentDescriptor{interval=2017-08-13T00:00:00.000Z/2017-08-14T00:00:00.000Z, version=‘2017-08-17T13:39:11.874Z’, partitionNumber=0}, SegmentDescriptor{interval=2017-08-14T00:00:00.000Z/2017-08-15T00:00:00.000Z, version=‘2017-08-17T13:39:11.874Z’, partitionNumber=0}, SegmentDescriptor{interval=2017-08-15T00:00:00.000Z/2017-08-16T00:00:00.000Z, version=‘2017-08-18T21:00:12.539Z’, partitionNumber=2}, SegmentDescriptor{interval=2017-08-20T00:00:00.000Z/2017-08-21T00:00:00.000Z, version=‘2017-08-24T17:32:58.105Z’, partitionNumber=1}, SegmentDescriptor{interval=2017-08-24T00:00:00.000Z/2017-08-25T00:00:00.000Z, version=‘2017-08-29T13:18:50.385Z’, partitionNumber=0}, SegmentDescriptor{interval=2017-08-27T00:00:00.000Z/2017-08-28T00:00:00.000Z, version=‘2017-08-29T13:18:50.385Z’, partitionNumber=3}]}, virtualColumns=[], limitSpec=NoopLimitSpec, dimFilter=null, granularity=AllGranularity, dimensions=[DefaultDimensionSpec{dimension=‘regn_n’, outputName=‘regn_n’, outputType=‘STRING’}], aggregatorSpecs=[TimestampMaxAggregatorFactory{fieldName='timestamp’, name=‘maxTimeStamp’}, LongSumAggregatorFactory{fieldName=‘sum_value’, expression=‘null’, name=‘sum_value’}], postAggregatorSpecs=, havingSpec=null}, peer=10.66.226.211}

1. ```
java.lang.NullPointerException
  1.  at io.druid.query.aggregation.TimestampBufferAggregator.aggregate(TimestampBufferAggregator.java:56) ~[?:?]
    
1. ```
	at io.druid.query.groupby.epinephelinae.BufferGrouper.aggregate(BufferGrouper.java:203) ~[druid-processing-0.10.0.jar:0.10.0]
  1.  at io.druid.query.groupby.epinephelinae.BufferGrouper.aggregate(BufferGrouper.java:212) ~[druid-processing-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:302) ~[druid-processing-0.10.0.jar:0.10.0]
  1.  at io.druid.query.groupby.epinephelinae.GroupByQueryEngineV2$GroupByEngineIterator.next(GroupByQueryEngineV2.java:202) ~[druid-processing-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:46) ~[java-util-0.10.0.jar:0.10.0]
  1.  at io.druid.java.util.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:49) ~[java-util-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.ConcatSequence$1.accumulate(ConcatSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]
  1.  at io.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:42) ~[java-util-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:43) ~[java-util-0.10.0.jar:0.10.0]
  1.  at io.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:42) ~[java-util-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:46) ~[java-util-0.10.0.jar:0.10.0]
  1.  at io.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43) ~[java-util-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.WrappingSequence$1.get(WrappingSequence.java:50) ~[java-util-0.10.0.jar:0.10.0]
  1.  at io.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[java-util-0.10.0.jar:0.10.0]
    
1. ```
	at io.druid.java.util.common.guava.WrappingSequence.accumulate(WrappingSequence.java:45) ~[java-util-0.10.0.jar:0.10.0]

Also, for a single segment dataset (wikiticker) it works and does the right thing.

I think “java.lang.IllegalAccessError” is happening because the time-min-max extension is trying to access LongMaxAggregator.COMPARATOR, which is a package-private field from the main app classloader, from the class TimestampAggregator, which is in an extension classloader. Even though the package looks the same, the access isn’t allowed cross-classloader. The error may not have arisen on your test cluster because this access may only be attempted for some query types.

It’s a bug in the time-min-max extension and I raised a patch here: https://github.com/druid-io/druid/pull/4788

Fwiw, you could also try using the simpler “longMax” aggregator. It will work on __time since __time is a long-typed column. And it should be faster too.

I think this would happen if the field you ask for (timestamp_) doesn’t exist in all segments. Ideally, the time-min-max extension should ignore it in that case, but it looks like it throws NPE instead.