I have a use case where i want to group all events within an hour and have an average out of it.
So for ex events from 01 to 02 hour (including event at exact 02) to be included in the average but what i see is the event at 02 is considered at 03 instead of 02 if ceil function is used.
I see ceil function behaviour in sql is different.
Even as per the definition here http://druid.io/docs/latest/misc/math-expr.html
it should return integer greater than or equal but it doesnt seem to return equal .
Is this as expected ?
Is there a function that adhere to my usecase that i can use?
Thanks in advance
I can confirm that I am seeing what you are seeing, this is indeed very strange way for CEIL to act (not what I would expect). I think an issue should be raised.
As for your use case of “i want to group all events within an hour and have an average out of it”, how come FLOOR (or CEIL as it works now) does not work for you?
This happens because what CEIL( TO Y) does in Druid is return the (exclusive) upper bound of the time bucket that includes . In particular, because it’s exclusive, it will never be equal to the timestamp. If you pass in something that is at the bottom end of a bucket (like you did) then you’ll get the start of the next bucket.
The function doesn’t seem to be standardized (at least I can’t find docs on it as a standard SQL function anywhere) so I guess it can behave however we want. IMO the current behavior is a bit weird, since it’s non-idempotent. So it would make more sense to special case it so it returns the provided timestamp if that timestamp is already the upper bound of a time bucket.
I raised this issue: https://github.com/apache/incubator-druid/issues/7777
Thanks Gian for helping and creating the issue . This is helpful
Thanks for checking this. So the way we show the data in ui is all the events that come from lets say 9.00 to 10.00 is shown on the chart(bar graph) at 10.00 . so we have to do groupby ceil(__time) but problem is lot of events come at dot 10.00 which gets ceiled to 11 and that leads lot of discrepency on the data we show for time 10.00.
Hope my explanation about the use case is clear