Druid v/s Kairos DB

Hi there,

we were debating whether to use Druid or Kairos as timeseries Database for our project. One thing that makes our dev team worry is the complexity involved in setting up the Druid v/s Kairos DB which is a simple tar file. Can I know the exact reason why there are 5 different java processes. Also, why the architecture looks like so complex. I am sure there must be several reasons for this. Also, it would be a great help if you can let me know the advantages of using Druid over Kairos DB.



Hey Anil,

The reason for the multi-process design is to isolate workloads. Historicals handle older read-only data, middleManagers handle new data coming in, brokers receive and fan out queries, coordinators and overlords coordinate the other node types. The idea is that you could tune and scale each piece individually, giving you a lot of control over your deployment and ability to squeeze out performance. Also, if one component becomes unstable or overloaded (perhaps middleManagers are overloaded from too much new data coming in) then the other components keep working smoothly. There’s also commercial distributions that simplify deployment, one of which is offered by the company you will find in my email address :slight_smile:

As to specifically Druid vs KairosDB, I haven’t used the latter, but it looks like it’s based on Cassandra. Druid is designed specifically for doing aggregations over time-partitioned data with high insert rates, and has a storage design that is geared towards that case and is very different from Cassandra’s. So Druid should generally be faster than a Cassandra based timeseries app.