Historical segmentCache on S3

Hi,
I’m setting up Druid 0.8.1 on AWS with S3, so my common config is as below:

druid.storage.type=s3

druid.s3.accessKey=KEY

druid.s3.secretKey=SECKEY

druid.storage.bucket=druidvirginia

druid.storage.baseKey=dataStorage

druid.storage.archiveBucket=druidvirginia

druid.storage.archiveBaseKey=archive

druid.storage.disableAcl=true

Into runtime.properties of Historical I’ve setup:

druid.segmentCache.locations=[{“path”: “/usr/local/indexCache”, “maxSize”: 53687091200}]

druid.server.maxSize=53687091200

But

  • is segmentCache growing indefinitely (with loadForever rule) ?

  • is there any way to setup segmentCache located on S3?

  • which is the difference between storage.baseKey and storage.archiveBaseKey? Which data will be stored in each one?

Thanks

Maurizio

Hi,
I’ve done this setup:

druid.segmentCache.locations=[{“path”: “s3://druidvirginia/history01/indexCache”, “maxSize”: 53687091200}]

druid.server.maxSize=53687091200

  • Is this the right one?

Unfortunately I didn’t see index generated on defined folder, looking at historical logs I see:

2015-10-02 00:01:23,804 INFO i.d.s.s.S3DataSegmentPuller [ZkCoordinator-0] Pulling index at path[s3://druidvirginia/dataStorage/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0/index.zip] to outDir[s3:/druidvirginia/history01/indexCache/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0]

2015-10-02 00:01:23,804 DEBUG o.j.s.i.r.h.RestStorageService [ZkCoordinator-0] Retrieving Head information for bucket druidvirginia and object dataStorage/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0/index.zip

2015-10-02 00:01:24,882 INFO c.m.c.CompressionUtils [ZkCoordinator-0] Unzipping file[/usr/local/tmp_druid/compressionUtilZipCache2759463639643714505.zip] to [s3:/druidvirginia/history01/indexCache/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0]

2015-10-02 00:01:25,021 INFO i.d.s.s.S3DataSegmentPuller [ZkCoordinator-0] Loaded 14725035 bytes from [s3://druidvirginia/dataStorage/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0/index.zip] to [/usr/local/druid-0.8.1/s3:/druidvirginia/history01/indexCache/buck_bidding/2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00/2015-10-01T13:00:00.000-04:00/0]

2015-10-02 00:01:25,021 WARN i.d.s.l.SegmentLoaderLocalCacheManager [ZkCoordinator-0] Segment [buck_bidding_2015-10-01T13:00:00.000-04:00_2015-10-01T14:00:00.000-04:00_2015-10-01T13:00:00.000-04:00] is different than expected size. Expected [8656883] found [14725035]

Here why destination become /usr/local/druid-0.8.1/s3:/druidvirginia/… (/usr/local/druid-0.8.1 is the path where I’ve installed Druid) ?

Why size are different and process stops is some undefined way?

Appreciate your help

Thanks

Hi, please see inline.

Hi,
I’m setting up Druid 0.8.1 on AWS with S3, so my common config is as below:

druid.storage.type=s3

druid.s3.accessKey=KEY

druid.s3.secretKey=SECKEY

druid.storage.bucket=druidvirginia

druid.storage.baseKey=dataStorage

druid.storage.archiveBucket=druidvirginia

druid.storage.archiveBaseKey=archive

druid.storage.disableAcl=true

Into runtime.properties of Historical I’ve setup:

druid.segmentCache.locations=[{“path”: “/usr/local/indexCache”, “maxSize”: 53687091200}]

druid.server.maxSize=53687091200

But

  • is segmentCache growing indefinitely (with loadForever rule) ?

Historical nodes will download segments until they hit their server.maxSize limit.

  • is there any way to setup segmentCache located on S3?

Not yet. S3 is a permanent backup for segments and segments must be loaded on historicals before they can be queried.

  • which is the difference between storage.baseKey and storage.archiveBaseKey? Which data will be stored in each one?

The archive key is used for archive tasks in S3, which can be used to periodically archive data. I don’t believe this is actually documented on http://druid.io/docs/latest/misc/tasks.html so we should fix that.