Enormous http monitoring requests from druid

Hi,

We’ve set up an intermediary service, that receives druids metrics and alerts configured from monitoring, and republishes metrics to stasd/graphite and alerts to sentry.

However, from what we’ve noticed, the intermediary service takes huge amount of RAM and it turns out, it receives up to 1GB of data per request! Majority of this data are alerts regarding SegmentLoadingException on historical.
While we’re aware of the problems there, is there a way to actually fine-tune what we publish in terms of alerts, or how often does druid re-publishes them on monitoring?

Our monitoring configuration for historicals:

Hi Grzegorz,

Did you set emissionPeriod? http://druid.io/docs/latest/Configuration.html#enabling-metrics

Property
Description
Default
druid.monitoring.emissionPeriod
How often metrics are emitted.
PT1m

Hi Grzegorz, what version of Druid is this?

0.6.160 at the moment, we’re getting ready to upgrade though.

emissionPeriod would still send us all gathered events though, right? There’s no way to turn alerts off?

The emissionPeriod controls how often certain metrics are emitted. As for turning alerts off, you can always have your alert emission endpoint discard the alert events.

Fangjin,

What happends if emitter receives for example 503 error on the alert emission endpoint? Will it republish them or simply forget about metrics it already tried to publish?

W dniu piątek, 22 maja 2015 07:44:12 UTC+2 użytkownik Fangjin Yang napisał:

It should try to republish. There is a max limit on how many messages can be queued. Upon hitting this limit, messages will be dropped.

In general, you can choose to respond 200 OK on receiving alerts and just drop them afterwards. That will ensure that nothing gets queued at the druid nodes.

– Himanshu