Historical cold tier size limits, rack definitions

I currently have all historical data stored with RF=1 on rather expensive 2TB SSDs (4 servers w/256GB RAM and 4x 2TB SSD each for a total of 8TB) and am running low on space. I would like to realign the cluster to convert 2 servers into hot storage with RF=2 and add 2 new servers that contain SATA drives for a cold tier configured with RF=2. I’m thinking about loading up 4x10TB drives in each of these cold tier servers, however I have some concerns. How much space is too much per historical? I’ve already had to bump up vm.max_map_count setting to 131,072 on the existing servers with only 8TB. If I store 5x that on the new servers, would the system be stable with 5x the mv.max_map_count setting? I’ve even contemplated running KVM off a SD card and creating a single VM per disk with direct access. However, that leads me to the next problem…

How do I tell druid that a historical server is in a specific rack or availability zone? If I do KVM the boxes I want to ensure that druid spreads the RF across both servers. However, if I don’t go with the KVM idea, I will need to go from 2 to 4 servers soon and will need to make sure druid is spreading data between the two racks (2 servers in Rack A, 2 servers in Rack B). If I were in AWS I’d want to spread it across availability zones. How do I tell druid to do that?

Thank you!


The historical “tier” is the closest thing to an AZ or rack or failure domain. Our current setup is a hot a cold and a cold2 tier. Cold is a failover replica of hot, and cold and cold 2 are replicas of each other (with equal priority) with specific time-recency load rules to control “hotness” of recent data. Since we run in the cloud these are in 3 separate availability zones.

for ulimit open files (or vm tunings or similar) settings it is more or less trial and error unfortunately. Each segment should be mapped only once. So the max should be #segments * tier_replicas / tier_num_servers approximately.

Hello sir,

I apologize for the extremely late response. Thank you for providing feedback. That is an interesting way of approaching the problem. In that model, it would make sense to set RF=1 for cold and cold2, since you already have redundancy.

I am wondering have you found an upper limit to the size of a historical server? We have multiple years worth of data that users do not regularly query, but needs to be available. I am contemplating using servers with a large amount of disks (10 x 2TB), but am worried about the practical limitations of the historical server. Have you pushed the limits of the historical nodes? Right now we currently are utilizing 4x 2TB with no problems.

Thank you!


Hey Nick just saw this and thought I’d throw in (a) remember maxsegmentstomove as you do things and (b) is there any option to increase queryGranularity on older data to reduce the size? Oh and © could some of the data just not be loaded at all until someone needs it?

/me goes back to his email