Real deploying doubts: Vanilla Druid vs Imply's Druid? HDFS vs S3?


We’re working on a Druid-based prototype, and right know we’ve solved all of our problems (except one, but this is for another post). But our prototype is only a toy yet, and we have to deploy a huge logs analysis system to offer good reporting performance & flexibility.

  1. We have recently discovered the Imply Druid distribution, and we have a lot of doubts about what are its advantages & disadvantages over the vanilla “distro”.
  2. We also have doubts about what are de advantages and disadvantages of using HDFS or S3 for the Druid’s deep storage layer.
    I have to add that I’m far new at my company and I don’t have any power over money-related decisions, but I’m also interested on how Imply could help us to deploy a ready-for-production infrastructure and how much costs this support, then we can guess how to balance the inner effort vs external support in an optimal way.

Thank you for your time.

Hey acorrea,

  1. The Imply distribution and the distribution are pretty similar, the main differences are that the Imply distribution includes extra tools like Pivot, PlyQL, and Tranquility, includes extra scripts to make the services easier to start and manage, and has better tutorials and sample configs (although we contributed those tutorials and configs to base Druid for the upcoming 0.9.0, so that will be less of a difference in the future).

  2. HDFS and S3 are both used by folks at scale in production, so I would suggest using whichever one you feel more comfortable with.

I can message you privately about our support services.