Does Druid.io support upserts?

Hi,

  I am new to Druid and currently trying to evaluate whether or not

Druid makes a good fit for my use case.

  Each row looks like this:

{“timestamp”:1483540348197,“device”:“417f0216-b0f8-4cf1-bf4a-589854ff70f5”,“country”:“US”,“manufacturer”:“Apple”,“app”:“com.facebook.katana”}

  My (first) problem is that I want to keep only one record per user

and hour. Is this supported out-of-the-box on Druid or I have to
use an intermediate storage (such as Redis) to deal with the
updates?

  Just for the records, ultimately, I want to count the following on

a hourly, daily, weekly etc. basis:

  - how many users came from US (in total and on per hour basis).
  - how many users came from an unknown place (in total and on per

hour basis).
- how many users came from US with Apple Vs. Android device (in
total and on per hour basis).

  Thank you for your time.

Druid does not support “upsert” as such but it does support “rollup” which can solve some of the same problems. With rollup, you define dimensions and Druid will essentially do a “group by” on your dimensions at ingestion time. What this means is that if you want hourly granularity, and you send Druid multiple rows with the same hour, user, country, device, app, etc, then Druid will only store one row for that combination.

And if you don’t need to remember the user ids, and can tolerate approximate counts, Druid can do better than that: leave out the user dimension, and add a hyperUnique aggregation. This allows Druid to approximately count users without actually storing the ids, and can save a lot of space. The approximation typically has an error of a few %.

See here for more on rollup: http://druid.io/docs/latest/design/index.html