Removing Old Tasks From Metadata DB

  • Druid Version: 0.22.1
  • Overlord mode: Remote
  • Metadata DB Type: Postgres (Yugabyte for HA)

I noticed that the druid_tasks table in the metadata DB has old records from months ago and they do not seem to be cleaned up:

select created_date from druid_tasks order by created_date asc limit 10;
       created_date       
--------------------------
 2021-09-16T16:40:51.849Z
 2021-09-16T16:41:28.975Z
 2021-09-16T16:41:28.982Z
 2021-09-16T16:43:16.635Z
 2021-09-16T16:43:16.643Z
 2021-09-16T17:05:43.188Z
 2021-09-16T17:05:43.194Z
 2021-09-16T17:41:34.512Z
 2021-09-16T17:41:34.520Z
 2021-09-16T17:43:21.845Z

I see there is a configuration option to have the Overlord clean this up as shown here: Configuration reference · Apache Druid

druid.indexer.logs.kill.enabled
druid.indexer.logs.kill.durationToRetain

I have set these values on the Overlord config to be:

druid.indexer.logs.kill.enabled = true
druid.indexer.logs.kill.durationToRetain = 2592000000 (30 days in ms, keep last 30 days)
druid.indexer.logs.kill.initialDelay (not set, use default)
druid.indexer.logs.kill.delay (not set, use default)

I also see it show up in the logs during start up:

2022-06-21T18:10:36,210 INFO [main] org.apache.druid.cli.CliOverlord - * druid.indexer.logs.kill.durationToRetain: 2592000000
2022-06-21T18:10:36,210 INFO [main] org.apache.druid.cli.CliOverlord - * druid.indexer.logs.kill.enabled: true

But when I check the druid_tasks table, nothing has changed. According to the doc, it should run within >5 minutes of start up and also every 6 hours.

# select created_date from druid_tasks order by created_date asc limit 10;
       created_date       
--------------------------
 2021-09-16T16:40:51.849Z
 2021-09-16T16:41:28.975Z
 2021-09-16T16:41:28.982Z
 2021-09-16T16:43:16.635Z
 2021-09-16T16:43:16.643Z
 2021-09-16T17:05:43.188Z
 2021-09-16T17:05:43.194Z
 2021-09-16T17:41:34.512Z
 2021-09-16T17:41:34.520Z
 2021-09-16T17:43:21.845Z
(10 rows)

# select count(*) from druid_tasks where active=false;
 count  
--------
 104157

I also tried adding the config to coordinator config without success. The doc doesn’t say it is an Overlord specific config but it mentions that the Overlord is responsible for it. It also doesn’t mention if there is an additional config/setting that needs to be turned on which looks like may be the case.

How does this configuration option get applied? Am I missing something or do I have to apply it to other druid services like historical, mm, etc.?

Thanks

Is there any chance you can restart the Overlord?

On configuration changes we do a restart of the Overlord since the new config won’t take into affect immediately

I believe that the config lines you have there are for the task logs, versus metadata DB.

Hi Peter,

The doc says this about the property that I enabled, “druid.indexer.logs.kill.enabled”

Boolean value for whether to enable deletion of old task logs. If set to true, Overlord will submit kill tasks periodically based on druid.indexer.logs.kill.delay specified, which will delete task logs from the log directory as well as tasks and tasklogs table entries in metadata storage except for tasks created in the last druid.indexer.logs.kill.durationToRetain period.

It mentions specifically the tasks and tasklogs tables and in my case, the druid_tasks table is not being cleaned up. In your link, it seems to be more about the supervisor tables being cleaned up by coordinator? Is the doc about the property I enabled just wrong and perhaps I should try your setting?