I’m running druid 0.22.1 and trying to use Autocompaction to update a datasource segmentGranularity from ‘HOUR’ to ‘WEEK’. I’m testing with a small datasource (15mb / ~400,000 rows) which has many segments (5,000+).
Both the web console and curl against the api (e.g. curl get against /druid/coordinator/v1/config/compaction/ ) show that my compaction config has been set/accepted.
This is the only datasource with Auto compaction enabled.
There are segments going back to 2021-05, so there are plenty of candidate segments to compact based on skipOffsetFromLatest.
There are 3 worker slots available and relevant compaction TaskSlot variables are: "compactionTaskSlotRatio" : 0.7, "maxCompactionTaskSlots" : 2147483647
I’m expecting autocompaction to start compacting based on this config.
Instead, I find that in the unified-console, the dataSource Compaction column shows Awaiting first run and has been stuck like this for 24h+. Reviewing Coordinator logs, I don’t see any indication that compaction is even being attempted. No errors. If I grep “compact” in coordinator logs, nothing is returned except for a single line (see logs below). If I grep for “my_datasource” I see only indexing tasks.
Note - There is a log line indicating the coordinator has started up Scheduling compaction (see Logs, below). This line appears in the coordinator which is not presently the leader. The current coordinator leader does not have a similar log line. Could this be related? But I can’t see any way to ‘turn on’ autocompaction asides from submitting compaction config as per above. I also wonder if I should be seeing this log line each coordinator period?
Are there other logs I should be looking at? I have mostly focused on Coordinator and overlord logs.
Thanks in advance!
Things I've tried
I’ve tried tweaking the compaction config with different settings, e.g. setting skipOffsetFromLatest to ‘P1W’ or segmentGranularity to ‘DAY’
Updating compactionTaskSlotRatio higher
Updating taskPriority higher
- Druid docs
I can't find anything that seems relevant. The only log which matches a grep of "compact" is this:
[main] org.apache.druid.server.coordinator.duty.CompactSegments - Scheduling compaction with skipLockedIntervals [true]
I’m wondering if you’re running into This policy currently cannot handle the situation when there are a lot of small segments which have the same interval, and their total size exceeds inputSegmentSizeBytes. If it finds such segments, it simply skips them.? I found that when I looked at the Compacting Segments and Segment search policy docs.
I’ll review those docs again. However, since the dataSource in total is < 20mb, and the inputSegmentSizeBytes is approximately 420 mb, I don’t think this is the case. Each segment is < 4KB, and going from hourly to weekly segments should be ~ 4KB/hour * 24hours * 7days = 672 KB as input size, much smaller than ~420mb.
That will be our plan B. We had upgraded to 0.22+ specificially to take advantage of updating segmentGranularity via autocompaction though. I will see if I can reproduce my scenario in in a sandbox and try to figure out if it’s a bug or config issue.
In the quickstart docker environment, I was able to get the auto-compaction to run against a sample of data from this datasource, using the same compaction spec. It must be something unique to our cluster or cluster state. I’ll try to update here what I find.
I took this a bit further this morning, modifying the quickstart docker-compose to include two coordinator/overlords and testing fail-over. It all worked as expected (compaction started & succeeded).
But the ‘smoking gun’ to me is that in the bad cluster lead coordinator, there are no log lines matching grep NewestSegmentFirstIterator.
And reviewing more closely, I noticed in the good cluster, both coordinators logged : INFO [main] org.apache.druid.server.coordinator.duty.CompactSegments - Scheduling compaction with skipLockedIntervals [true]
during startup (within a few seconds of each other). In my bad cluster, the current leader never issued this log line during startup or failover, so I guess coordinator.duty.CompactSegments was never running on this coordinator since before compaction specs were issued, hence the stuck in Awaiting First Run status.
Are there any configs / conditions which would cause a coordinator to start up without this duty being scheduled?
I think we will just try restarting coordinators and checking they both emit this log-line.
@Renato_Cron Here was my docker-compose with two coordinators. As mentioned, I couldn’t seem to break it locally the same way I’m experiencing on our deployed cluster. Compaction proceeded each time despite failovers and other hiccups I tried to throw at it.
Yesterday we restarted most of our cluster nodes including coordinators, and still compaction won’t begin.
@gitmstoute I might have managed to reproduce the behavior by following our Quickstart tutorial and selecting hour as the segment granularity? Auto compaction wouldn’t kick off. Let me spin it up again and take a look at the logs.
@Mark_Herrera This seemed to reproduce it for me too! The compaction did not start overnight. The logs look similar to my bad cluster, there are no compaction tasks being launched, NewestSegmentFirstIterator never seems to run.