Segments cache used disk space issue

druid version 0.21.1
I have several druid data sources with defined retention rules.
When I first start druid, historical is writing the segments in segment-cache directory on a mnt disk.
I wait till all segments cache are loaded and get “Empty load/drop queues” on historical.
it takes around 100GB of disk size.
I have real-time ingestion with retention rules, so new segments are being added and the old ones are removed.
I see files getting removed but disk size keeps increasing with time as if the file remains allocated even after deleted.
du -sh returns the size of my segments cache as 100GB
while df -h is showing a higher used disk space (saying 150GB)
Restarting druid and disk used space is back to 100GB after all segments are loaded.
Any idea what might cause this issue?

Is druid.server.maxSize set? If it is then it might be in conflict with the values set in druid.segmentCache.locations. There was a PR about this a couple of years ago.

No maxSize is not set.
I have druid.segmentCache.locations set like below:
druid_segmentCache_locations=[{“path”:“var/druid/segment-cache”,“maxSize”:“200g”}]

Are Middle Managers running on the same node? Could it be the temporary files that the MM creates while ingesting?

i am using druid with docker.
I have 3 containers for historical and 1 container as middlemanager.
I have set middlemanager to write on another mnt disk.
the disk assigned to the historicals is the one that keeps inflating.
I am forced every few days to restart the broker and historical containers to free up the extra disk space

Hi Mark, what are you using as your deep storage?

I found this about the discrepancy of df and du:

Many good suggestions there to investigate further.

deep storage is stored on mnt volume other than the one used for segment cache

Hello Sergio.
Yes I am aware of the discrepancy of df and du
The question here is why druid process is holding the deleted cached segments files and not releasing them till restart?
Wondering if anything can be done in druid configuration in this concern.
I will continue investigating in parallel if related to bare metal server or docker configuration

Thanks, I apologize for the delay, I did not see your reply there. I hope you are you issuing a kill task to get rid of dropped segments?

in fact, I am restarting the docker containers for broker and historical in order to release the deleted cached segments. Note that the deleted segments are not visible under segment cache directory.

Hey @mtawk so again (as usual) I am late to the party!

This really is a bit weird. Are you able to trace the Drop instruction being issued by the coordinator (maybe in the logs?) through to the Historical picking it up (maybe in their logs too?). One thing that came to mind was that maybe the user doesn’t have delete permission in the location?

There is also a setting druid.segmentCache.deleteOnRemove – the default being true