TimeSeriesQuery with granularity of a daily, 15 second window starting at a particular time.

Hey,

I’ve been reading about granularities: http://druid.io/docs/latest/Granularities.html, but have had no luck with this particular query.

I want to do some aggregations based on a 15 second window of the day, at a specific time, every day for the interval.

e.g. What was the count between 16:00:00 and 16:00:15, every day.

I’ve been trying things like:

“granularity”: {“type”: “duration”, “duration”: 15000, “origin”: “2014-01-01T16:00:00”},

“granularity”: {“type”: “period”, “period”: “P1D”, “timeZone”: “Europe/London”, “origin”: “2014-01-01T16:00:15”},

“granularity”: {“type”: “period”, “period”: “P1D”, “timeZone”: “Europe/London”, “origin”: “2014-01-01T16:00:00-00:00:15”},

Side question: In the example on the Granularities page there is a dash in the origin, what does this mean?

{"type": "period", "period": "P3M", "timeZone": "America/Los_Angeles", "origin": "2012-02-01T00:00:00-08:00"}

``

Thanks,

Richard

I want to do some aggregations based on a 15 second window of the day, at a specific time, every day for the interval.

I’m not sure I understand your question fully. You can do 15-second aggregation, but what should happen to the data outside of that particular window?

Druid aggregation granularities cover the entire timeline, each timestamp must be assigned an aggregation interval.

Currently all aggregations are “regular” in the sense that they divvy-up time into equal-sized time period – or semi-regular when taking into account months or daylight saving.

I’ve been trying things like:

“granularity”: {“type”: “duration”, “duration”: 15000, “origin”: “2014-01-01T16:00:00”},

This is probably closest to what you want, 15 second buckets, but it will create a lot of them, each in 15 second intervals (0-15, 15-30, 30-45, etc. for every single 15 second period you query). You will have to manually specify your query intervals to select only those particular hours that you care about.

“granularity”: {“type”: “period”, “period”: “P1D”, “timeZone”: “Europe/London”, “origin”: “2014-01-01T16:00:15”},

“granularity”: {“type”: “period”, “period”: “P1D”, “timeZone”: “Europe/London”, “origin”: “2014-01-01T16:00:00-00:00:15”},

Side question: In the example on the Granularities page there is a dash in the origin, what does this mean?

The origin simply defines how the interval buckets are lined up. If you defined PT1H, with origin at 2015-01-01T00:15:00, then your aggregation intervals are defined as 01:15:00-02:15:00, 02:15:00-03:15:00, etc.

The “dash” at the end of the origin timestamp is the timezone offset. All our timestamps are defined as ISO-8601 (http://en.wikipedia.org/wiki/ISO_8601)

It sounds like you want to filter down to specific 15-second windows. Like Xavier said, the “granularity” is meant to be a way of dividing up the input intervals, but doesn’t really help you filter things. I think the simplest way to do what you want is to use a 15-second query granularity, but also granulate your input intervals. For a two-day query that’d be something like:

“intervals”: [“2014-01-01T16:00:00/PT15S”, “2014-01-02T16:00:00/PT15S”],

“granularity”: {“type”: “duration”, “duration”: 15000}

Thanks guys!

Gian’s solution works perfectly.

Thanks Xavier: It’s good to understand the timezone notation.

I’ve been using this slightly more extensively and have run into some problems.

This approach doesn’t work if the duration doesn’t divide into 24 hours without a remainder.

e.g.

“intervals”: [“2014-12-31T07:00:00Z/PT10S”, “2014-12-30T07:00:00Z/PT10S”],

“granularity”: {“type”: “duration”, “duration”: 10000}

Gives 2 results. As expected.

[ {

“timestamp” : “2014-12-30T07:00:00.000Z”,

“result” : {

“count” : 5,

}

}, {

“timestamp” : “2014-12-31T07:00:00.000Z”,

“result” : {

“count” : 4,

}

}

“intervals”: [“2014-12-31T07:00:00Z/PT11S”, “2014-12-30T07:00:00Z/PT11S”]

“granularity”: {“type”: “duration”, “duration”: 11000},

Gives 4 results.

[ {

“timestamp” : “2014-12-30T06:59:50.000Z”,

“result” : {

“count” : 0

}

}, {

“timestamp” : “2014-12-30T07:00:01.000Z”,

“result” : {

“count” : 5

}

}, {

“timestamp” : “2014-12-31T06:59:55.000Z”,

“result” : {

“count” : 0

}

}, {

“timestamp” : “2014-12-31T07:00:06.000Z”,

“result” : {

“count” : 4

}

} ]

Gives 3 results.

“intervals”: [“2014-12-31T07:00:00Z/PT11S”, “2014-12-30T07:00:00Z/PT11S”]

“granularity”: {“type”: “duration”, “duration”: 11000, “origin”: “2014-12-30T07:00:00Z”},

[

{

“timestamp” : “2014-12-30T07:00:00.000Z”,

“result” : {

“count” : 5,

}

}, {

“timestamp” : “2014-12-31T06:59:54.000Z”,

“result” : {

“count” : 0,

}

}, {

“timestamp” : “2014-12-31T07:00:05.000Z”,

“result” : {

“count” : 4,

}

} ]

The only solution I can see is to do 1 query per day and include the origin.

The simple granularity settings seem to act in a way I wasn’t expecting.

Setting a granularity of all returns 1 result no matter how many intervals I specify.

What would be awesome is being able to say something like: “granularity”: “interval”, which does the aggregation per interval?

Any ideas / help appreciated.

Thanks,

Richard

If your intervals are always on separate days, then setting granularity to “day” should do what you want. Otherwise, I think your best bet is to make a separate query for each interval.

It is once a day yes.

Do you mean having multiple intervals (one for each day?). Or one big interval across all the days?

If you set granularity to day then it seems to ignore any ‘intraday intervals’.

I mean having multiple intervals, but setting granularity to “day”, like this:

“granularity” : “day”,

“intervals” : [

“2015-03-10T03/pt5h”,

“2015-03-11T03/pt5h”

]

In that case what’ll happen is you will get two results, one for each interval. The returned timestamps will be truncated to the day (2015-03-10T00:00:00.000Z and 2015-03-11T00:00:00.000Z), but the results will be for the interval you asked for.

Making a separate query for each interval would work too.