Can any one share their experience in using Cassandra vs HDFS for deep storage? If you can share any benchmarks pertaining to volumes, scalability etc will be great. Remember, I am talking this about a production size volumes where data is in peta bytes.
If you’re talking about production then an important question you need to ask is “which system do I already have operational expertise in?”
Overall the HDFS deep storage has much more battle testing than the Cassandra (or at least that folks are publicly talking about). So I would recommend that one if everything else is equal.
The Cassandra storage is a community contribution, so we don’t have a maintainer for it at the current time, and it is unknown what versions of cassandra other than 1.0.8 will work with the extension.
If anyone on this list uses Cassandra deep storage extensively in production please speak up.