segments which do not fit in memory

Hi,

When druid starts, my understanding is that if the segments fit the memory of the historical nodes, they are loaded up in memory and serve only from memory.

Of course if memory is not enough, they need to reload from external storage (S3, HDFS).

Does druid also use disk/SSD to store more of the segments, as a more quick access than S3/HDFS?

Thanks,

Nicu

Hi Nicu,

Yes - Druid will load segments from external storage (S3, HDFS) onto disk and will access these segments as memory-mapped files which are paged in and out of memory by the OS. Hence, you will see performance benefits when using SSDs over magnetic hard disks. You will need to size your historical nodes such that the cluster of historical nodes has enough disk space to contain your data set, otherwise you will get incomplete query results which do not contain the data that could not be loaded to disk.

Thank you!

Another question, can I store the same segment on multiple ssds / multiple hosts, like a replication factor to load balance or fail over quickly, given that segments are mostly read-only?

Thanks,

Nicu

Hi Nicu,

Yes definitely! You configure replications by setting load rules in the coordinator console. See:

http://druid.io/docs/latest/operations/rule-configuration.html

The Druid coordinator will automatically handle loading and re-balancing of the segments to ensure segment replicas live on different nodes for fault tolerance and are distributed in a way that maximizes query performance.