Missing Ingestion Metrics Using Kafka Based Ingestion

We’re operating our Druid cluster using Kafka based ingestion. We would really like the following metric for performance testing:

ingest/events/processed

Based on the documentation here: Metrics · Apache Druid This metric only gets produced by enabling the RealtimeMetricsMonitor. After some digging and testing, I’ve found that the RealtimeMetricsMonitor is deprecated. Some documentation pointed me to use the TaskRealtimeMetricsMonitor, which I also found isn’t working. Druid fails to start up when enabling this monitor.

Would appreciate some insights into how we can get the ingest/*/* metrics produced. We are already getting the ingest/kafka/lag, ingest/kafka/maxLag, and ingest/kafka/avgLag metrics from the supervisor.

1 Like

Welcome @Andrew_Ho! What sorts of errors are you seeing when TaskRealtimeMetricsMonitor is enabled and Druid fails to start?

Hi! I’m also noticing something similar, most ingestion metrics are missing for me when ingesting data from Kafka.

If it helps, what I’ve found so far is that the metrics are available when using an HTTP emitter (sending metrics to a druid-exporter deployment), but not when using the Prometheus emitter. I can see the metric metadata in Prometheus, but no series are ever published:

# HELP druid_ingest_events_processed Number of events successfully processed per emission period.
# TYPE druid_ingest_events_processed counter
... nothing

I’ve tried with Druid version 0.22 and 0.23 and still no luck.

The monitors that I’m using for my middle managers are:

[
"org.apache.druid.server.metrics.EventReceiverFirehoseMonitor",
"org.apache.druid.java.util.metrics.JvmMonitor",
"org.apache.druid.java.util.metrics.JvmThreadsMonitor",
"org.apache.druid.client.cache.CacheMonitor"
]

Since the metrics are emitted when using an HTTP emitter, I don’t think it’s related to the monitors which are being used. But from a quick look at the code I’m also not sure what could be the issue.

Hi @razzu
Thanks for the inputs. We are also using the Prometheus emitter. I did some more digging and found the reason for the missing metrics after looking through this documentation: Hidden - Prometheus Emitter - 《Apache Druid v0.22.1 Documentation》 - 书栈网 · BookStack

I believe these ingestion metrics would be generated by Peon tasks, and based on above documentation we would need to use Prometheus pushgateway to get metrics from short-lived jobs.

Thank you @Mark_Herrera! I saw similar errors when enabling the RealtimeMetricsMonitor and the TaskRealtimeMetricsMonitor. Both errors were similar to this thread: Unable to configure specific emiiter

Good point, @Andrew_Ho , thanks for the info! Yes, I guess ingestion tasks are performed by the peons and not the middle managers themselves, so we likely need to set up a push gateway for the metrics we’re looking for. I’ll try to set something up on my side to validate this.

1 Like

@razzu We are also using Prometheus-emitter for metrics collection. Currently we are using exporter as strategy for now. As i can in above discussion for short lived jobs we need to use pushgateway. In documentation it was mentioned we can use any of one strategy. Lets say if we are interested in both the metrics how can we do that?

BTW any luck with ingestion metrics?

@bharathkuppala

For Peon processes (which are responsible for ingestion and which emit most of the ingest/* metrics), using the pushgateway strategy is the only option. To configure this, you’ll need to override the prometheus-emitter configurations on the Peon processes by setting some additional configurations on the Middle Manager processes:

druid.indexer.fork.property.druid.emitter.prometheus.strategy = pushgateway
druid.indexer.fork.property.druid.emitter.prometheus.pushGatewayAddress = ...

With a setup like this, I’ve been able to get ingestion metrics pushed to the pushgateway. However, these metrics are only pushed at the end of the ingestion task, and not at every emission period. If you want to receive them more often than that, I’m afraid that’s not supported right now, from what I can see in the code.

We had a similar discussion on Slack, if you’d like to see more context around this: Slack

3 Likes

@razzu Thanks for the info :grinning:

Regarding **druid_jvm_mem_init or in general jvm related ** We are only receiving metrics wrt middleManager and routers but not for any other nodes. Is this expected or do we have to do any further configuration?

Example Metric
max(druid_jvm_mem_init{component=~“$component”}) by (component,memKind)

Please crct us if we are doing anything wrong