Hoping you can help me
with the research that I started for my project on Druid.
We are talking about a system with retention of 12 month and
around 100~ milliard rows of data per year.
The objective is to make analytic queries on web service;
therefore there are a few items I need to fill, please see the list below:
** From user/concurrency point of
Is there a limitation for
concurrent users running queries?
In case I need to add more
users, what is the cost? (More hardware/servers/memory?)
From data point of view:
Is there a limitation on how
many dimensions/lookup tables I can design? Is there any kind of
What will be the cost of
adding another dimension/lookup table? (More
hardware/servers/memory or redesign of the solution?)
What about the history, as I
mention before we are talking about one year retention, but if I need to
add more retention time, example 3 years, what will be the implication?
From Programming point of view:
Are the windows functions
not, is there a workaround for it?
partially, which one is supported or not supported.
Are Joins between tables
supported? (Is there a limitation for table size?)
Are nested queries supported?
Are functions like Count, sum,
avg., max, min, etc. supported? How is it working with uniqueness? (Count
(distinct X), Sum (distinct X), etc.)
Are ranking functions, like in
MSSQL, supported? (row_number, rank, dense_tank, ntile)
How Druid is
supporting the comparison between populations? Example, I need to get the
population of users which didn’t purchase in the last 3 days but they
click on my site at least 1 times. (when purchase and click are different
events types in the system)
Flexibility from integration
point of view (could it work with Java/python/etc.)
Some words about the company I’m working for, it is a personalized
retargeting specialist providing web and app advertisers with display ads
(banners), in real time, for visitors who have left their sites without
completing a purchase. These users are served ads as they continue surfing the
web or browsing other apps. Personalized retargeting is a form of online
targeted advertising, in which online advertising is delivered to consumers
based on their previous actions (such as pages browsed, products added to
basket) on a company’s website or app.
I will appreciate any help you can provide, since the answer based
on your experience to pointing me to documents/tutorials/white papers/etc.
If any additional details are needed please let me know.