Druid is awesome but

There just seem to be way too many tweaks that need to be made to configs to keep things running smoothly. Is the long term goal of the development team to improve Druid such that the system makes better decisions on its own given the nature of the data flowing into Druid? I’ve just had to spend wayyy too much time tweaking configs because things break due to various configuration issues. Don’t get me wrong, I’m a huge fan of Druid. I hope it continues to succeed, But I really think it needs some work around self-management.

When we started our Druid journey, one of the problems we first tried to solve was building the JVM and runtime configs based on the flavor of servers running them. We are in AWS and we wrote a user data script that pulls the config from an S3 bucket and tweak config based on vcpus and memory of the EC2 instance.

For the most part, that was it. The only time we had to rework the configs was when we started playing with single_dim and hash partition based batch ingestion. I can totally relate to your comment and I agree with you.

It’d be great to hear more about your particular hurdles … making Druid easier to work with and adopt is a priority for us here for sure … hit me up directly at peter.marshall@imply.io?