We are doing a Druid POC, in which we are using Kafka Indexing Service for real-time ingestion of data. We are keeping segementGranularity as ‘hour’. The problem we are facing even though 95% of our data falls within the segment time interval, there is a possibility due to the upstream systems that data might be late by various hours. Due to this problem, the number of shards in a given time interval increases after a while, this creates smaller partitions.
We tried running compact task for a single day by keepSegmentGranularity as true. According to documentation
If set to true, compactionTask will keep the time chunk boundaries and merge segments only if they fall into the same time chunk.
Although segmentGranularity was ‘hour’, but running compact task by giving 1 day as time interval resulted in only 1 partition for the whole day. Is there any config to not merge segments across segment time chunks boundary or do I need to run compact task per hour?
I know automatic compaction is there in Druid 0.13 but we are using 0.12.3 for Druid POC.