Question about load and drop rules

Hi,

How do I configure load rules such that my historical only loads the last 1 year data and drops everything else?

My understanding is to use this for load rule :

{
  "type" : "loadByPeriod",
  "period" : "P1Y",
  "tieredReplicants": {
      "hot": 1,
      "_default_tier" : 1
  }
}

How would the drop rule work for the above scenario?

Also, can anyone confirm if I configure drop rule for a period of 2 years the segments will be not be loaded into historicals and therefore cannot be queried? but the segments still exist in deep storage. Will the indexes get deleted? and what id I need to load all the segments again ? should I just remove the existing load and drop rules and configure load rule to load forever?

{
“type” : “loadForever”,
“tieredReplicants”: {
“hot”: 1,
“_default_tier” : 1
}
}

Bump

Bump

Hi Naveen,

There is a great tutorial here. Have you checked it out?

http://druid.io/docs/latest/tutorials/tutorial-retention.html

Best regards,

Eric

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io

Hi Eric,

I did go through the tutorial but the information doesn’t help much with my use case. I did get an idea on how to configure load rules but the drop rules are the one that I was skeptical about.

I di configure drop rules by period would that automatically adjust to the current date ?.. Let’s say my drop rules are configured to drop data for the last 2 days. Would that regularly get updated by date?

Hi Naveen,

Yes, rules by period are adjusted according to the current timestamp, so if you have a P2D drop rule you’d never be able to query the last 48 hours.

Normally if you want to retain 1 year only, you would create a rule that loads P1Y then another rule to dropForever. If you don’t include a drop rule, data beyond 1 year will be retained by the default loadForever rule.

You’re correct that the data will not be on the historicals but the segments will still exist in deep storage. The segments will be marked as used=0, which tells the coordinator not to load them. If you change the load rules in the future to load more data, you can click on ‘enable datasource’ in the coordinator console to change those segments to used=1 in order for them to be loaded onto the historicals (provided you did not remove them from deep storage with a kill task).

http://druid.io/docs/latest/operations/rule-configuration.html

Best,
Caroline

Hey Caroline,

Thank you so much for the clarification.

“you can click on ‘enable datasource’ in the coordinator console” . This would only be available when you disable the whole datasource right ?..if i load data only for 1 year by using loadByPeriod,DropForever rules and then decide against it how would i be able to retain the data ? Can i replace the loadByPeriod rule for like greater than a year (probably 2 or 3 )and that would load the segments it previously dropped ?

Thanks,

Naveen

Also,

Does the load rules handle future segments(segments with a future timestamp)?

Ah, the ‘enable datasource’ button can be used even when the datasource is already enabled - it just changes segments from used=0 to used=1 according to the load rules. Alternatively you can change these yourself in the metadata db, but this is easier :slight_smile:

Future data will be loaded only if you have the workaround mentioned at the top here:

https://github.com/apache/incubator-druid/issues/5869

Follow the PR’s in the above to see how this will be changing in 0.14+.

Best,

Caroline

Got it …so i change the load rules and then click on enable datasource in the old coordinator console to load the previously dropped segments. Thank you for clarification Caroline.

Thanks,

Naveen