Hello guys,
Can you please help in the below scenario where events will come as
{“timestamp”:“2020-01-01T01:01:35Z”,“key”:“1”,“value”:20}
{“timestamp”:“2020-01-01T02:01:35Z”,“key”:“1”,“value”:1}
{“timestamp”:“2020-01-01T03:01:35Z”,“key”:“1”,“value”:35}
{“timestamp”:“2020-01-01T04:01:35Z”,“key”:“1”,“value”:31}
{“timestamp”:“2020-01-02T03:01:35Z”,“key”:“1”,“value”:35}
{“timestamp”:“2020-01-02T05:01:35Z”,“key”:“1”,“value”:29}
{“timestamp”:“2020-01-03T04:01:35Z”,“key”:“1”,“value”:31}
{“timestamp”:“2020-01-01T03:01:35Z”,“key”:“2”,“value”:35}
{“timestamp”:“2020-01-01T04:01:35Z”,“key”:“2”,“value”:31}
{“timestamp”:“2020-01-02T03:01:35Z”,“key”:“2”,“value”:35}
{“timestamp”:“2020-01-02T05:01:35Z”,“key”:“2”,“value”:29}
{“timestamp”:“2020-01-03T04:01:35Z”,“key”:“1”,“value”:31}
The desired ingested events would be the event with max timestamp of the day only as below. As in the incoming events there are multiple events for same day for key (1&2).
{“timestamp”:“2020-01-01T04:01:35Z”,“key”:“1”,“value”:31}
{“timestamp”:“2020-01-02T05:01:35Z”,“key”:“1”,“value”:29}
{“timestamp”:“2020-01-03T04:01:35Z”,“key”:“1”,“value”:31}
{“timestamp”:“2020-01-01T04:01:35Z”,“key”:“2”,“value”:31}
{“timestamp”:“2020-01-02T05:01:35Z”,“key”:“2”,“value”:29}
{“timestamp”:“2020-01-03T04:01:35Z”,“key”:“2”,“value”:31}
Although we can achieve the same with queries itself but we don’t want to store the unnecessary tuples.
Although I have tried the rollup with segment granularity as day, and have tried the maxTime aggregator with druid-time-min-max, which allows me to get the max timestamp of the day, but this will set the maxtimestamp for the particular day in all the events. But it won’t filter out earlier events, as we only want to consider the event with max time stamp for day.
Please help me with right strategy.
Thanks in advance
Kuldeep Gaur