Druid use case evaluation

Hello,

We are currently evaluating columnar databases for our application.Below is an abstract description of our requirement.

Our app is like a dashboard powering and drilling down among several tables having wide columns

Tables

Our tables have around 1000 columns called custom fields.

There are currently 20 such fixed tables along with 30 other tables.

Data types are a mixture of int,bigint,tinyint,double,decimal and text.

Text columns form the majority with 30% of the columns i.e 300 columns approx.

Query Patterns

View

This is like a dashboard view comprising of 20 columns which can be mixed/joined among the 50 tables i.e the view can be created with columns comprising from any of the tables.There can be joins,where conditions when defining the view itself.

Sorting

The user can sort by any of the columns in the view in any manner i.e ascending/descending.

Where conditions

There can be where conditions in any of the columns and the view.

Text columns can have conditions like starts with (abc%), ends with (%abc), contains (%abc%).They can also be sorted based on alphabetical ordering.

Integer columns can have aggregate functions such as sum,>,<,>=<= etc.

Joins

The user can join these tables based on the unique/primary key column.Considering the max amount, there can maximum 50 joins although this is not the general case.

We are currently analyzing MonetDB, but not much confident about its stability.

Is there any person who can confirm this.I can even cross post this in the developers group as well if needed.

Please get back to me in case of any questions.

Thanks for the wonderful work you are already doing.

Cheers,

Madusudanan.B.N

Hi,

Can you explain more of your use case for joins and why you cannot denormalize the data?

I haven’t really considered de-normalizing them since they have a mixture of one-one,one-many and many-many relationships and it is a little bit hard to denormalize them.

With that said, if joins are out the way and we have everything inside of a single table, will druid perform well ? Since the only restriction I saw with druid was joins and they are of course a hard computer science problem to solve.

Also if I de normalize them, then that would be adding several columns to the already existing 1000 columns, i.e considering there are three tables involved then it could be 3000 columns, since druid is columnar I am assuming that this is the way it should be built and it would be fast.Please advise.

If you have any experiences on denormalizing a strict relational model, then it would be helpful if you could share any information related to that.

Thanks,

Madusudanan.B.N

Please see: http://druid.io/docs/0.9.0/querying/joins.html for joins.

It is best to think not as a relational database and get away from the mindset of relational databases. I recommend denormalizing the data unless any of the tables are fairly small and you can use lookups to do the join functionality, for example what you would do with a traditional star schema.

If the tables are completely de-normalized then it would mean a single wide table of around 50000 columns.The reads would be fast, but I doubt how this would affect data ingestion.

We currently have around 40K update queries,10K insert queries,3K delete queries per day on average on the 1000 column tables.Will the architecture/system cope up with such write speeds ?

AFAIK, in a columnar db the insert speed is directly proportional to the amount of columns i.e we are talking about O(1) - row based vs O(N) - column based where N is the number of columns we are inserting, this is not really in the asymptotic sense but kind of similar .In monetdb we are seeing update queries taking even 500ms and I suspect a similar number for druid as well.With the wider table in place,I am assuming the write speeds to be a lot worse.

Can you please throw any light on this ? Is my understanding correct here.

Thanks,

Madusudanan.B.N

Druid has the ability to support star schema joins, so if you have smaller tables that exist external to Druid, you don’t need to index them as part of joining them in Druid.