Hi Guys,
Hoping you can help me
with the research that I started for my project on Druid.
We are talking about a system with retention of 12 month and
around 100~ milliard rows of data per year.
The objective is to make analytic queries on web service;
therefore there are a few items I need to fill, please see the list below:
-
** From user/concurrency point of
view**: -
Is there a limitation for
concurrent users running queries? -
In case I need to add more
users, what is the cost? (More hardware/servers/memory?) -
From data point of view:
-
Is there a limitation on how
many dimensions/lookup tables I can design? Is there any kind of
performance implications? -
What will be the cost of
adding another dimension/lookup table? (More
hardware/servers/memory or redesign of the solution?) -
What about the history, as I
mention before we are talking about one year retention, but if I need to
add more retention time, example 3 years, what will be the implication? -
From Programming point of view:
-
Are the windows functions
supported? -
If
not, is there a workaround for it? -
If
partially, which one is supported or not supported. -
Are Joins between tables
supported? (Is there a limitation for table size?) -
Are nested queries supported?
-
Are functions like Count, sum,
avg., max, min, etc. supported? How is it working with uniqueness? (Count
(distinct X), Sum (distinct X), etc.) -
Are ranking functions, like in
MSSQL, supported? (row_number, rank, dense_tank, ntile) -
How Druid is
supporting the comparison between populations? Example, I need to get the
population of users which didn’t purchase in the last 3 days but they
click on my site at least 1 times. (when purchase and click are different
events types in the system) -
Flexibility from integration
point of view (could it work with Java/python/etc.)
Some words about the company I’m working for, it is a personalized
retargeting specialist providing web and app advertisers with display ads
(banners), in real time, for visitors who have left their sites without
completing a purchase. These users are served ads as they continue surfing the
web or browsing other apps. Personalized retargeting is a form of online
targeted advertising, in which online advertising is delivered to consumers
based on their previous actions (such as pages browsed, products added to
basket) on a company’s website or app.
I will appreciate any help you can provide, since the answer based
on your experience to pointing me to documents/tutorials/white papers/etc.
If any additional details are needed please let me know.
Thanks!
Guy