Druid load/drop rule

Hello, I have a data source configured with two tiers. I want the data of the last 3 days to be stored on the default node and the data of more than 3 days to be stored on the cold storage. My configuration is as follows. But the result is not reasonable.

[
    {
        "period": "P3D",
        "includeFuture": false,
        "tieredReplicants":
        {
            "_default_tier": 1
        },
        "type": "loadByPeriod"
    },
    {
        "period": "P7D",
        "includeFuture": true,
        "tieredReplicants":
        {
            "cold": 2
        },
        "type": "loadByPeriod"
    },
    {
        "type": "dropForever"
    }
]

I found that my default storage still has data for the last seven days. Quite a waste of storage resources. My druid version is 0.22.1, can anyone know why?thanks.

Relates to Apache Druid <0.22.1>

Welcome @dennis666! Thanks for including your load rules.

Regarding the first part of your question, have you tried specifying the type in the default node, something like this:

{
  "type" : "loadByPeriod",
  "period" : "P3D",
  "includeFuture" : false,
  "tieredReplicants": {
      "_default_tier" : 1
  }
}

Also, can you share a bit more about your configuration? I’m looking at the Historical tiering doc and wondering if your historical/runtime.properties might shed some light on the behavior you’re describing.

Thanks @Mark_Herrera for your response. The following is the configuration of my default node and cold node

default historical runtime.properties

druid.service=druid/historical
druid.plaintextPort=8051

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=16
druid.processing.numThreads=62
druid.processing.tmpDir=var/druid/processing

# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":"1000g"}]

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=2GiB

cold historical runtime.properties

druid.service=druid/historical
druid.plaintextPort=8051

# HTTP server threads
druid.server.http.numThreads=8

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500MiB
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=var/druid/processing

# Segment storage
druid.server.tier=cold
druid.server.priority=100

# Segment storage
druid.segmentCache.locations=[{"path":"/data1/druid/var/druid/segment-cache","maxSize":"2000g"},{"path":"/data2/druid/var/druid/segment-cache","maxSize":"2000g"}]
druid.server.maxSize=4000g


# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256MiB

Hey @dennis666 ! Sorry if I am late to the party on this one!

In the Services tab, are the servers in your _default_tier still loading / offloading anything – ie, is the coordinator still working to apply the rules. One config option often missed when it comes to coordinator moves is maxSegmentsToMove as the default is quite low.

Next level down for me would be to look at the coord log as every druid.coordinator.period (default 60 seconds) it would be looking to apply the rules - so you should see a lot of log entries. These may tell you some more hints!

Thanks @petermarshallio very much for your reply. I have increased the parameter maxSegmentsToMove. After waiting for a day, the expired segment on _default_tier node is still not deleted successfully.

My Coordinator dynamic config:

{
  "millisToWaitBeforeDeleting": 900000,
  "mergeBytesLimit": 524288000,
  "mergeSegmentsLimit": 1000,
  "maxSegmentsToMove": 1000,
  "percentOfSegmentsToConsiderPerMove": 100,
  "useBatchedSegmentSampler": true,
  "replicantLifetime": 15,
  "replicationThrottleLimit": 10,
  "balancerComputeThreads": 5,
  "emitBalancingStats": false,
  "killDataSourceWhitelist": [],
  "killAllDataSources": false,
  "killPendingSegmentsSkipList": [],
  "maxSegmentsInNodeLoadingQueue": 500,
  "decommissioningNodes": [
  ],
  "decommissioningMaxPercentOfMaxSegmentsToMove": 100,
  "pauseCoordination": false,
  "replicateAfterLoadTimeout": false,
  "maxNonPrimaryReplicantsToLoad": 2147483647
}

load / drop rules

The following is the segment loaded by _default_tier node

I tried to add dropforever after P3D, but another problem was introduced. The data on my cold node was missing

Hi @dennis666 - retention rules are executed in the order they appear, so you have effectively told it to remove data older than 3 days.

I think what you are looking for is rules 1,3,2 - so

loadByPeriod P3D _defaultTier
loadByPeriod P7D cold
dropForever

Kyle

If my rule is set to 1,3,2, the data of my _defaultTier node will also be retained for 7 days.