Druid Docker repo

Hi all,

I was thinking, is there a direction that you are heading with the druid-docker repository? I’ve opened several pull requests, but they are still pending. I think it would be nice to have a running Docker image that users easily can test and run theirselfs.

Cheers,

Fokko

Hey Fokko,

I don’t think there is much of a direction at all. It rarely comes up in Druid community discussions. Could I ask what direction you’re hoping to take it in with your patches? Something for testing, or production, both? It sounds like you’re the most interested so far, so I would vote to give your voice a lot of weight :slight_smile:

Hi Gian,

Thanks for the quick response. At the company I’m currently working, we use a modified docker version inside of our CI. Every time we modify the queries for Druid, we test them against the Druid inside of the CI, which works very well.

Most of the stuff that I’ve stumbled on is contributed back to the repo by the pull requests. The current state of the docker-druid repo is quite poor; building the image from scratch doesn’t work, and still a lot of the enhancements that I’ve contributed are still open. Because the image doesn’t build, also the Docker hub build is not working properly, which is a pity.

I think we should revive this repo so the bar to give Druid a try is lower. I’m also working on a docker compose together with Superset by Airbnb, so people can easily try out Druid by building an interactive dashboard. I’d be happy to take the lead on the Docker part of Druid.

Kind regards, Fokko

Hi Fokko and Gian,

We are working with Druid using Dockers; we are prepared a docker and kubernetes deploys and statefulsets to operate a production Druid cluster. Currently, it’s work in progress, but I hope that we can open source it soon.

Regards,

Andrés

Hey Andrés,

That sounds rad. I think there’s a lot of people out there that would love to read about your work.

Hey Fokko,

That sounds cool, let me check up on those patches.

Hey Fokko,

I committed a few of your patches; most of the rest have conflicts now, but I’ll take another look if you could resolve them. Looking forward to seeing some more functional docker-druid stuff!

Hi Andrés,

Sounds very interesting, I think it would also be possible to a certain extend to auto-scale the number of historical/broker nodes. When the CPU on these machine exceed a certain threshold, it could spin up more machines using kubernetes/openshift.

Thanks Gian, I’ll revise the PR’s. Thanks a lot for merging the PR’s.

Cheers,

Fokko

Hi Giam,

I’ve updated all the PR’s, please check if you agree. If there are any questions, please let me know. I’ve check all the changes, so I’m pretty sure everything will work.

Cheers, Fokko.

Hi Giam and Fokko,

Currently, we are working on:

  • we are using the statefulsets to mount a volume with the segment cache on the historicals.

  • we are testing the new google deep-storage plugin that works with GFS. Seems to work quite well!! :slight_smile:

  • we work with new Kafka streams library that works amazing over k8s, so we are using the new Kafka indexing service to index the data into Druid.

In the future, we want to configure autoscale based on the resource utilization too :slight_smile: … and later maybe we use custom metrics like (query-time response or the Kafka lag on the indexing-kafka) to autoscale up and down.

We need to make the dockers and kubernetes files more configurable to allow to change the deep-storage for example when we’ll organize the repo, we want to open it to the community!!

Regards,

Andrés

I looked through them and committed a few more.

Hey Andrés,

That sounds awesome. I take it you’re running kubernetes in GCP? Are you using their hosted version or running your own install?

Thanks Gian,

Really appreciate it.

Cheers

Hey Gian!

Currently, I’m using the hosted version ‘Google Containers Engine’ in GCP, but we also work with on-premise kubernetes on VMWARE.

Regards,

Andrés