"Local" lookups with druids for multi-tenant use case

Hi all,

I’m new to Druid and trying to figure out if it’s appropriate for my use case, I would appreciate any suggestion and/or pointers that you could give :wink:

Basically, the data that I have is structured the following way :

  • Several companies (a few hundreds for the moment, can go higher), each company having their own report (multi-tenancy, reports have no link between them and can/should be isolated from each others)

  • A report for a given company is essentially a big table representing a big time-series (by day) with around 10 dimensions and 20 metrics. Expected number of rows per day is around 100k for one report, history should be kept at least for one full year. This structure is common to all companies.

  • No roll-up can be done => user should always be able to quickly drilldown to the most granular details.

  • Aggregations / filtering could be done by any of the 10 dimensions (expecting to use something like superset to give users full control over their data for this).

  • Now the tricky thing : for each report one of the dimension (the keyword, I’m in the SEO business) should be mapped to several dynamic dimensions according to the user’s configuration.

For example a given company may have around 10k different values for the “keyword” dimension in its report, and each of those keywords are mapped to tags (each of those tags having maybe 10 or 20 different values) that are dimensions that could be used for aggregation purpose.

Those tags are dynamic and could be changed by the user at any moment, so I would like to avoid including them directly as dimensions in the report, because changing the tag mapping would imply rebuilding the historical segments.

Note : the way those tags are defined are completely different across companies.

Initially I was thinking to define the report as one big database in druid, and tags for each company as a lookup I could update dynamically but I’m afraid having hundreds of lookups, each of them having several tens of thousands of entries would be too much.

If I create one database per company (I’m guessing lookups are at least local to a database), I wonder if there will just be too many processes (yarn …) that will span due to the fact that I have several hundreds of companies.

So I’m not sure what would be the recommended approach in this case ?

Thanks !


Hi How did you solve this