Re: [druid-user] Re: Kafka Ingestion Service Lagging every hour (Overlord)

For the overlord API not responding during the 10-15 minute duration when handoff is happening, you may need to increase the number of http threads.

Other possible approaches/solutions being:

  1. Use a bigger instance type for your metadata RDS,
  2. Enable slow query logging to understand what queries are taking a long time,
  3. Add index/indices on the segments table. Internally, the following index helped us bring down the run time for a query executed after acquiring the “giant lock” from 1.6 seconds to 300 milliseconds:
    CREATE INDEX idx_druid_segments_dsend on druid_segments(dataSource, end);

Good luck!

Sorry, I meant to say creating metadata for new segments. It doesn’t have to do with handoffs.