We are using Kafka druid indexing and just upgraded our Kafka broker to 0.10.1.0. And we found Kafka logs are out of disk space. We found log.retention.hours=168, so we can change log.retention.hours=24 to solve out of space problem. We have several questions:
This retention is calculated after druid consumed or just the Kafka logs are created.
Druid may consume Kafka offset multiple times to achieve exactly once. How to make sure that certain Kafka offsets are consumed and published to historic node before Kafka delete these offsets (log.retention period)
Potential breaking changes in 0.10.1.0
The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment.
Will it affect Kafka indexing service for indexing old data (couple months delayed data)? What does "Largest timestamp of the messages " mean? Is it created by Kafka itself or the timestamp inside our own data schema? If it is our timestamp, our old data will be deleted immediately?