Optimal configuration of druid.segmentCache.locations for multiple disks

If a server has multiple local disks, would it be best to define a logical volume and mount point on OS level and configure this as a single logical disk in the druid.segmentCache.locations property or would it be better to configure the druid.segmentCache.locations property then with a list of individual volumes?

What would be the pros/cons of each setup?

Given that it is always possible to define a logical volume which spreads multiple physical ones, I was wondering why Druid supports multiple segment-cache locations in the first place? Is it rather a convenience feature or can Druid make better use of volumes if they are given it as separate volumes than the operating system?

For example, we have r3.8xlarge instances which have 2 local disks for a hot tier and i2.8xlarge instances with 8 local disks for the cold tier.

What are the pros/cons of either of the following setups?


{“path”: “/mnt/persistent/logicalvolume”, “maxSize”: 6400000000000}



{“path”: “/mnt/persistent/volume1”, “maxSize”: 800000000000},

{“path”: “/mnt/persistent/volume2”, “maxSize”: 800000000000},

{“path”: “/mnt/persistent/volume3”, “maxSize”: 800000000000},

{“path”: “/mnt/persistent/volume4”, “maxSize”: 800000000000},

{“path”: “/mnt/persistent/volume5”, “maxSize”: 800000000000},

{“path”: “/mnt/persistent/volume6”, “maxSize”: 800000000000},
{“path”: “/mnt/persistent/volume7”, “maxSize”: 800000000000},
{“path”: “/mnt/persistent/volume8”, “maxSize”: 800000000000}

Strange, no reply to this suggestion yet.


yeah, I find it strange too that such things as the above are not more in the center of discussions.

Here’s what I could find since the time I asked the question:

Recently the following issue received some posts and in one of them it is stated that
“another thing… using JBOD instead of RAID on historicals.”

By JBOD (Just a bunch of disks) I assume/hope they refer to the above manner of declaring several individual disks.

It never hurts to bump a thread after a while and see if some new people notice it :slight_smile:

The main reason to do multiple locations on separate volumes per disk, instead of a single RAID or logical volume, is that often logical volumes have an overhead and you pay the price in lower max i/o rates. Mileage may vary so it’s good to test in your specific situation. If you don’t have time to test, my default would be to assume that a separate volume per disk is best (especially vs. software RAID).

I’m also testing multiple path in segmentCache.

Server has SAS disks and SSD PCI disk, so I’ve created two locations:

druid.segmentCache.locations=[{“path”: “/usr/local/ssd/indexCache”, “maxSize”: 1073741824000},{“path”: “/usr/local/sas/indexCache”, “maxSize”: 29686813949952}]


The strange thing is that historical node is currently using only second one.

Which are the rules historical side to decide which one of the locations to be used?

Is there any way to give a preference or an order in location management?



does anyone know which is the logic applied to an array of segmentCache.locations?

Does Historical starts from the last volume and then fill the previous and so on?

Is there a way to have last data always on specific volume?