S3 Storage Usage and Costs

Hi All,

We are designing a solution for a new startup that we is still on the early stages.
As we have no HDFS experience, we are thinking to use Amason S3 for deep storage.

As S3 pricing varies according to the plan used and also how often the data stored is accessed etc I would like to know your experience regarding costs in S3.

To give a better example, for the first 50TB, SE Standard costs 0.0245 per GB per month, where S3 Standard - Infrequent Access costs almost half that. So one of my question is, should I use S3 Standard - Infrequent Access ?

I assume that his has to do with how frequently deep storage data is accessed. My understanding is that it is accessed only when there is a failure etc. But is this correct?

Also if you have some examples from your usage on S3, it would be much appreciated.

Kind Regards,
Ioannis Ntantis

Hey!

Deep Storage is written to by any ingestion task – how much truly depends on the incoming data volume and the effectiveness of the data processing you set up in Druid. E.g., use of indexes on columns and how much roll-up you set up.

Deep Storage is read by the Historical processes as data is made available (the coordinator does all this bit), if it needs to be balanced, to enforce replication factors, and – of course – when you add a new node it must load all of it. Again, how much gets read depends on things like your Load and Drop Rules (some or all of the data?) and how often you completely rebuild your historicals.

Did you consider Min.IO? I used it for a cluster of Raspberry Pis…
A Raspberry Pi Apache Druid cluster? Well why not!

Something like this…

Dear Peter,

Thank you very much for your detailed explanation.
To be honest I did not know of MinIO. From what I see they do not charge per access, only for the storage.

Again thank you, much appreciated.

Regards,
Ioannis

1 Like