We are planning to build a production scale druid cluster for supporting our Tableau visualisation layer. All data will be aggregated before loading to Druid. We have around 30 dashboards and all together we are expecting 500 GB of parquet files to be loaded loaded to Druid from HDFS. 99% of the queries fired from Tableau will be either a group by or filter query.
What should be the hardware spec of the druid cluster? How should the data replication impact the storage spec ?
Is clustered deployment spec specified at https://druid.apache.org/docs/latest/tutorials/cluster.html a good specification?