Historical doesn't clean-up the segment-cache after removing segment from deep storage

Hello,

Found some strange behavior.

If I disable segment and run kill jobs historical server keep going to segments in segment-cache and serve the queries.

I am playing with **druid-0.14.0-incubating **in local Docker environment. I doesn’t matter which type of deep storage the same behavior found with local and S3 storage type.

How to reproduce?

$ curl -XPOST -H ‘Content-Type:application/json’ http://172.20.0.10:8090/druid/indexer/v1/task -d @…/…/druid/apache-druid-0.14.0-incubating/quickstart/tutorial/wikipedia-index.json

{“task”:“index_wikipedia_2019-05-06T09:18:12.920Z”}

$ curl -XGET -H ‘Content-Type:application/json’ http://172.20.0.12:8081/druid/coordinator/v1/datasources
[“wikipedia”]

$ curl -XGET -H ‘Content-Type:application/json’ http://172.20.0.12:8081/druid/coordinator/v1/datasources?simple
[{“name”:“wikipedia”,“properties”:{“tiers”:{"_default_tier":{“size”:4821529,“segmentCount”:1}},“segments”:{“maxTime”:“2015-09-13T00:00:00.000Z”,“size”:4821529,“minTime”:“2015-09-12T00:00:00.000Z”,“count”:1}}}]

$ curl -XGET -H ‘Content-Type:application/json’ http://172.20.0.12:8081/druid/coordinator/v1/datasources/wikipedia/segments
[“wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:18:12.956Z”]

select segment_id,is_published,is_available,is_realtime from sys.segments;
┌──────────────────────────────────────────────────────────────────────────────────────┬──────────────┬──────────────┬─────────────┐
│ segment_id │ is_published │ is_available │ is_realtime │
├──────────────────────────────────────────────────────────────────────────────────────┼──────────────┼──────────────┼─────────────┤
│ wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:18:12.956Z │ 1 │ 1 │ 0 │
└──────────────────────────────────────────────────────────────────────────────────────┴──────────────┴──────────────┴─────────────┘
Retrieved 1 row in 0.03s.

``

$ curl -XDELETE http://172.20.0.12:8081/druid/coordinator/v1/datasources/wikipedia/segments/wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:18:12.956Z

$ curl -XPOST -H ‘Content-Type:application/json’ http://172.20.0.10:8090/druid/indexer/v1/task -d ‘{“type”: “kill”,“dataSource”: “wikipedia”,“interval” : “2015-09-12/2015-09-13”}’
{“task”:“kill_wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:36:43.120Z”}

select segment_id,is_published,is_available,is_realtime from sys.segments;
┌──────────────────────────────────────────────────────────────────────────────────────┬──────────────┬──────────────┬─────────────┐
│ segment_id │ is_published │ is_available │ is_realtime │
├──────────────────────────────────────────────────────────────────────────────────────┼──────────────┼──────────────┼─────────────┤
│ wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:18:12.956Z │ 0 │ 1 │ 0 │
└──────────────────────────────────────────────────────────────────────────────────────┴──────────────┴──────────────┴─────────────┘

``

At the end we received new status.

Deep storage is empty from segments but historical cache still have removed segment.

$ docker-compose -f docker-compose-dev-localfs.yml exec historical find /var/druid/segments/

/var/druid/segments/

/var/druid/segments/intermediate_pushes

$ docker-compose -f docker-compose-dev-localfs.yml exec historical find /var/druid/segment-cache/

/var/druid/segment-cache/

/var/druid/segment-cache/info_dir

/var/druid/segment-cache/info_dir/wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-05-06T09:18:12.956Z

/var/druid/segment-cache/wikipedia

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z/0

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z/0/00000.smoosh

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z/0/version.bin

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z/0/meta.smoosh

/var/druid/segment-cache/wikipedia/2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z/2019-05-06T09:18:12.956Z/0/factory.json

All logs since services startup till the end of case are in attachments.

Would you be so kind to help or at least say what I am doing wrong?

Thank you

Best regards,

Yevgen Shramko

middleManager.log (105 KB)

historical.log (89.1 KB)

coordinator.log (851 KB)

overlord.log (109 KB)

Hello Yevgen,

I’m not sure what exactly is the issue. Could it possible that a load rule configured is causing this behaviour? http://druid.io/docs/latest/operations/rule-configuration.html

Quote from the doc: “If a Load rule is used to retain only data from a certain interval or period, it must be accompanied by a Drop rule. If a Drop rule is not included, data not within the specified interval or period will be retained by the default rule (loadForever).”

Thanks,

Sashi

Hi Sashi,

I apologize for delay with my response.

I would expect another behavior for default rule {"_default":[{“tieredReplicants”:{"_default_tier":2},“type”:“loadForever”}]}

If segment(s) was(were) disabled and kill task was run and successfully removed correspondent segments from deep storage I would also expect cleaning segments cache of historical service as the source segments does not exist anymore on deep storage.

Thanks,

Yevgen

Hi Yevgen,

You are right. I have no clue what the issue could be. Is the issue reproducible say with other data sets as well ?

Thanks,

Sashi