Druid Evaluation on AWS - Need deep drill down into Druid internals

Hi Guys,

I did our quick rapid functional evaluation of Druid and am impressed with it's Data exploration, ad-hoc query capabilities for large data sets at low latency.

I'd like to deploy the cluster into AWS for further evaluation. Our data sets are in S3 buckets which are processed by Spark through Kafka streams. The realization of a druid cluster needs Real time indexing on Kafka data as well as queries over S3 buckets.

To be able to assess a production-grade Druid cluster in terms, of cluster sizing, communication between nodes, storage sizes needed for segments etc.,
1) I'd like to dig deeper into Druid's internals. Can you guys point me to materials on this ?
2) Are there references to production-grade deployments using terraform?
3) Materials on deployment management, upgrades, monitoring etc.,

Best Regards


I’ve found the following (as well as looking at the code) useful:




I’m not sure that it’s totally up to date but there’s an example of a production deployment here: http://druid.io/docs/latest/configuration/production-cluster.html

For further support it’d be worth looking at setting something up with the folks at https://imply.io/services



Thanks for your prompt response, Dylan,

I did some research on this and I now have some clue on how it has to be clustered.

Do you if there are frameworks that automate the deployment ?

Best Regards