Configuring cold tier with S3 data store

Hi,

With 2TB data that needs to be mapped to cold tier, how much CPU/RAM/Data store we should consider for historical nodes ?

Can we configure s3 or NET app filter source to be the data store for historical data?

Chitra

Hi Chitra,

The way Druid is designed, data must be loaded into an on-disk segment cache on the historical nodes in order to be queried. In other words, you cannot use something like S3 (which would be the ‘deep storage’) as the segment cache, as Druid will not pull data from S3 at query time and will ignore any data that is not already loaded on a historical in the cluster. You will need enough disk to hold all the data that you require to be queryable.

Regarding CPU/RAM allocation - that is more a question related to your query patterns and your latency expectations. Historicals typically perform best when you have a higher RAM-to-CPU ratio, so if you’re looking for an instance type suitable for a cold tier, I would look at something similar to the i3 or i3en family on AWS.