Where does Druid store older tasks ids?

Hi,
Druid creates tasks with ids like index_kafka_myTopic_76daba12151cd44_gkpfjdmf and then any logs related to this indexing are stored in an S3 bucket with the same folder name as the task id above.

My question is - I want to look at logs for a task from a month ago. But Druid only keeps a record of tasks for the last 24 hours. The logs are there in my bucket, I just need a way to find the task_id for a task that was created on a specific day a month ago. How do I do this?

TL;DR - How to find all the tasks that were created in a specific time interval more than a month ago (beyond the retention policy)?

Hi ritratt,

Since the logs are in your bucket, maybe grep for multiple patterns? Something like grep 'pattern1\|pattern2' fileName_or_filePath?

Best,

Mark

Thanks, this is what I ended up doing. (query S3 on when the file was created/modified)

You can also configure the retention for task entries in the console by setting, for instance

druid.indexer.storage.recentlyFinishedThreshold=P8W

in the Overlord properties. This will actually bring old entries back.