Could someone please help me understand what impact does Compaction have on the size of the segments? To what extent is the size of segments reduced, and what factors impact the reduction in size of segments after compaction?
Suppose the data ingested in Druid at HOUR segment granularity for a day and size of segments is 50 GB. If I run compaction at DAY segment granularity level (target compcation size is 700 MB), to what extent should the segment size reduce post compaction for that day? Is there any formula to calculate this?
Compaction is used to reduce the number of segments created for the interval. In a given interval/segment granularity if you have 10 segments created then you can compact them into one segments based on the configuration you have set for the data-source.
There are two ways you can create compaction.
- Manual compaction
This task you create manually and submit to coordinator. Here you can change the segment granularity and compact the different interval segments into one.
Check the above documentation to setup the task with fields.
- Auto compaction
This is auto task which gets created and runs for that data-source. In this task you compact segments of the same interval into one segment.
Check the above documentation to set it up. This can be done via API or setting it up on the druid console.
Compaction wont reduce the size of the segments. It is done to create less segments which will help is query performance and other. So if you multiple segments created for an hour granularity then you can compact all the segments to day granularity. If you target compaction size is 700 MB, then it will create multiple segments of the same size.
More details : https://druid.apache.org/docs/latest/ingestion/data-management.html#compact
Compaction is basically used to get rid of small segments, I don’t think there would be much impact in size overall. I mean if you have 18 segments and each is of approximately 100 MB, Then compaction can give 2 segments each would be around 800 MB in size ( based on compaction task configuration).
On the second question: Changing the granularity from HOUR to DAY will definitely reduce the segment size due to RollUp and it will depend on the cardinality of other dimensions.
Compaction is nothing to do with Rollup.
Thank you for the detailed information. That helps!
Sure Vinay, Druid unified console has a pretty view of the compaction. You can check it out.