rollup/Compact all record to maintain last value only

When I do the following things, what I expect is that there is only 1 record but instead there are 2 records happens.

  • Ingest 2 records from a kafka topic as json
    e.g. record 1 → { time: t1, Id:1, name: “Name1” }
    e.g. record 2 → { time: t2, Id:1, name: “Name2” }

  • Step 2
    Now I don’t know how to achieve this, that when i ingest this data, I expect to see that the table only contains 1 record
    e.g. record 2 → { time: t2, Id:1, name: “Name2” }
    where based on t2> t1 and id is the same, field Name should be updated with the latest record value ‘Name2’

  • Step 3: instead, everything I tried, I get 2 separate records

That means, I probably haven’t understood how to setup rollup to get this table to only reflect the latest value of the fields in my table matching Id column as primary key.

Please help me construct the correct ingestion spec to achieve the outcome. I have setup the query granularity to ‘all’. The outcome here is that the table becomes a reference table I can use in my main telemetry dataset. how do i get the above table generated/persisted in Druid?


Things I've tried
  • configuring rollup (without really understanding) how to rollup strings like name fields
  • read about the compaction called ‘last’ value, but haven’t understood how to configure them and whether that is a solution to the problem I am looking for?
  • My experiment 3
References
Architecture
Logs ``` you can paste your logs here ```

Relates to Apache Druid

At query time, you may be able to use LATEST

Note that you cannot apply the latest function at ingestion time to only bring in the last row from Kafka.

It’s worth remembering that Druid is not a “state store” in the same way that you might be able to achieve with KSql, for example, to only show the latest value of some variable.

What is the actual data representing, incidentally? Perhaps there is another design pattern I may have heard of, locked away in my memory!!

thanks for responding.

I have an actual IoT telemetry data, that has lots of dimensions. Those dimensions could be “names” of things that change may be quite regularly (sometimes few times in a week, sometimes no changes for months) that data is being ingested. there are up to 6-7 dimension columns - each one of them is an Id of some aspect of the telemetry data source. The lookup reference table such as the one I am ingesting above will contain multiple ‘label’ fields like name in them, where I am simply interested in the latest value.

Storing them all in the original Telemetry table will mean that a lot of records will have to be deleted/re-ingested when any of the labels change. Hence the choice/desire to use reference table and a inner join technique on a shallow truth table or pseudo ‘state store’ inside Druid.

thanks once again.

hhmmm… now I have not used this (!!!) but I know that there is a Kafka lookup connector – is that kind of what you would need? I think it will give you the latest value for some key that you can then JOIN to at query time?

I believe @Vijay_Narayanan1 may know a lot more than me about it!