A little confusion about "druid.segmentCache.locations" property

Hi experts:

I have a liiter confusion about “druid.segmentCache.locations” property. I hive 3 historical nodes and the property set as below:

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:50000000000}]

That means 50GB segments local disk cache for each historical nodes and 150GB cache totally.

After I ran several batch ingestion, the 150GB local disk cache was ran out,and the coordinator UI shows “100% to load until available” about the new dataSource I just ingested.

In the Historial Node Configuration Documentation, I found the description of “druid.segmentCache.locations” and dafault value NONE means no caching, so I unset the properties but the historical nodes fail to started and I got error as below:

2016-08-30T03:35:12,099 ERROR [main] io.druid.cli.CliHistorical - Error when starting up. Failing.

com.google.inject.ProvisionException: Unable to provision, see the following errors:

  1. druid.segmentCache.locations - may not be empty

at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:131) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:131) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

while locating com.google.common.base.Supplier<io.druid.segment.loading.SegmentLoaderConfig>

at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:132) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.StorageNodeModule)

while locating io.druid.segment.loading.SegmentLoaderConfig

for the 2nd parameter of io.druid.segment.loading.SegmentLoaderLocalCacheManager.<init>(SegmentLoaderLocalCacheManager.java:59)

while locating io.druid.segment.loading.SegmentLoaderLocalCacheManager

at io.druid.guice.LocalDataStorageDruidModule.configure(LocalDataStorageDruidModule.java:53) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.LocalDataStorageDruidModule)

while locating io.druid.segment.loading.SegmentLoader

for the 1st parameter of io.druid.server.coordination.ServerManager.<init>(ServerManager.java:106)

at io.druid.cli.CliHistorical$1.configure(CliHistorical.java:78) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.cli.CliHistorical$1)

while locating io.druid.server.coordination.ServerManager

at io.druid.cli.CliHistorical$1.configure(CliHistorical.java:80) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.cli.CliHistorical$1)

while locating io.druid.query.QuerySegmentWalker

for the 5th parameter of io.druid.server.QueryResource.<init>(QueryResource.java:110)

while locating io.druid.server.QueryResource

So I want to know about this property:

(1) I just used hdfs as the deep storage, so why historical still need local disk cache?

(2) I want to set no caching, what the error mean? how to fix it ?

Druid needs to download and cache data locally before it can be announced for possible serving. It’s not done on demand. So, you need enough segment cache allocated to store 100% of your data.

Gian:

Than you for quick reply. As you say even if I have persisted segments on HDFS druid still needs another copy of segments in local disk cache??? And what does it meaning “no caching” as the default value of “druid.segmentCache.locations”?

Zha Rui

在 2016年8月30日星期二 UTC+8上午11:45:37,Gian Merlino写道:

Yeah, you still need another copy on the local disks. Druid always serves queries off its local memory or disk, and views deep storage as more of a “backup”. druid.segmentCache.locations is mandatory since the default value of “none” is not a workable config.

I‘ve got it. Thank you so much Gian !

在 2016年8月30日星期二 UTC+8下午1:11:53,Gian Merlino写道:

@Gian
If I dont use HDFS but LOCAL config (single historical node) where all segments are on local, can I disable segment cache?

Thanks,

Mo

You can not run druid without deep storage unless all the druid nodes are in the same machine.

So you need deep storage eg S3 or HDFS and enough disk to cache the segment locally.

Hi Gian, can the segment cache location be an mounted NFS? Or does it have to be a local disk directory or local memory only?

Regards,

Varsha

Hi Varsha,

While it is theoretically possible to set segment cache location to mounted NFS, It is not advisable as druid memory maps the files from segment cache in order to serve queries and using NFS there can adversely affect query performance.

Thanks Nishant.