This feature decreases storage size by 40% while improving query speeds by 75%.
- Parallelism: Multiple processes can operate on different portions of data simultaneously.
- Distributed storage: Data can be spread across several data servers thus allowing even commodity hardware to meet memory and disk requirements.
- Improved I/O management: Reasonably sized partitions make reading from/writing to disk easier and transmission over network less prone to failure.
- Granular replication: With the same factor, say 2, replication at the partition level allows better fault tolerance than replication of unpartitioned data as a whole.
This feature is available for native batch ingestion and compaction. If you ingest your data with Hadoop you could have a compaction job rewrite it to range partitioning. For a bit more context, check here and here.