Can mysql in druid auto to delete data in table named pendingSegments?

Hi all:

Our system find a problem, in our system, we have 125 datasources, every datasource has 2 replication, so we have 250 tasks to generate segments. and when it come to integral point(we set integral point to publish data), 250 tasks will query to mysql, especially a table named pendingSegments. so we find that there are 50W+ pieces of data in only 12days, it made our system very slow in integral point. now we add index to mysql to speed the query.

So I want to consulting that if the mysql can auto delete the history data in table named pendingSegments? or if we can delete it by ourselves? Hope your reply, and hope everyone’s answer!

Thanks,

Haoxiang

Hi Haoxiang,

It’s fine to remove entries from the pendingSegments table as long as the sequence_name doesn’t match the sequence name of a running task. Practically, for the Kafka indexing service, you should be able to delete any entries that are older than {now} - {taskDuration}.

I filed this issue to look at implementing auto-cleanup of the table: https://github.com/druid-io/druid/issues/3565

Thanks! now we just add index to mysql to speed the query, but we don’t know if the data increase in the feature that it will influence the query in druid. So hope the druid.io can implement the auto-cleanup or add the index in this table default. Thanks for your reply!

在 2016年10月14日星期五 UTC+8上午2:24:30,David Lim写道: