Need informations about metrics

Hi,

I’m currently working on Druid metrics to get monitoring of my cluster.

For now, it’s worling pretty well, but I’m facing some missing things in documentation.

For example, I used the monitors from this page

https://druid.apache.org/docs/latest/configuration/index.html

But I noticed some monitors missing, like “TaskCountStatsMonitor” that is only quoted in this page : https://druid.apache.org/docs/latest/operations/metrics.html

It’s also unclear for which node we have to activate which monitor, and that could maybe be added in the documentation.

I’m open to contribute to make the documentation better but where could I find an exhaustive and up-to- datelist of metrics and monitors (maybe whithout having to dive in the code) ?

Thanks

Guillaume

Hi Guillaume,

Unfortunately, I suspect the code is the best place to look for anything not already in the docs. If you hadn’t figured it out already, you’ll want to be looking for implementations of ‘Monitor’ for undocumented monitors, and look at calls to ‘Emitter.emit’ to find any undocumented metrics.

Hi,

Thanks for your reply.

I hadn’t so much time today to further investigate about that but I met your point about where to find the informations :slight_smile:

I’ll make a quick inspection next week and raise a PR to improve docs.

Guillaume

Hi,

I made a (not so) quick review about metrics and monitors, and I found some missing the docs.

Here is what I found missing (sometime the whole monitor is missing, sometimes just some metrics are)

CgroupMemoryMonitor
cgroup/memory/%s
cgroup/memory_numa/%s/pages

CpuAcctDeltaMonitor
cgroup/cpu_time_delta_ns
cgroup/cpu_time_delta_ns_elapsed

HttpEmittingMonitor

emitter/events/emitQueue
emitter/events/large/emitQueue
emitter/buffers/emitQueue
emitter/buffers/reuseQueue
emitter/events/emitted/delta
emitter/buffers/dropped/delta
emitter/buffers/allocated/delta
emitter/buffers/failed/delta

JvmCpuMonitor

proc/cpu
jvm/cpu/total
jvm/cpu/sys
jvm/cpu/user
jvm/cpu/percent

JvmMonitor
jvm/heapAlloc/bytes
jvm/gc/mem/max
jvm/gc/mem/capacity
jvm/gc/mem/used
jvm/gc/mem/init

JvmThreadsMonitor
jvm/threads/started
jvm/threads/finished
jvm/threads/live
jvm/threads/liveDaemon
jvm/threads/livePeak

SysMonitor
sys/mem/actual/used
sys/mem/actual/free
sys/fs/files/count
sys/fs/files/free
sys/disk/queue
sys/disk/serviceTime
sys/net/read/packets
sys/net/read/errors
sys/net/read/dropped
sys/net/read/overruns
sys/net/read/frame
sys/net/write/packets
sys/net/write/errors
sys/net/write/dropped
sys/net/write/collisions
sys/net/write/overruns
sys/uptime
sys/la/1
sys/la/5
sys/la/15
sys/tcp/activeOpens
sys/tcp/passiveOpens
sys/tcp/attemptFails
sys/tcp/estabResets
sys/tcp/in/segs
sys/tcp/in/errs
sys/tcp/out/segs
sys/tcp/out/rsts
sys/tcp/retrans/segs
sys/net/inbound
sys/net/outbound
sys/tcp/inbound
sys/tcp/outbound
sys/tcp/state/established
sys/tcp/state/synSent
sys/tcp/state/synRecv
sys/tcp/state/finWait1
sys/tcp/state/finWait2
sys/tcp/state/timeWait
sys/tcp/state/close
sys/tcp/state/closeWait
sys/tcp/state/lastAck
sys/tcp/state/listen
sys/tcp/state/closing
sys/tcp/state/idle
sys/tcp/state/bound

TaskCountStatsMonitor
task/interrupt/count
task/interrupt/elapsed

TaskRealtimeMetricsMonitor ==> it seems to replace the deprecated RealtimeMetricsMonitor but docs are not up to date on this.

DataSourceOptimizerMonitor
/materialized/view/query/totalNum
/materialized/view/query/hits
/materialized/view/query/hitRate
/materialized/view/select/avgCostMS
/materialized/view/derivative/numSelected
/materialized/view/missNum

For the following, I did’nt found out which monitor the metrics are related to

namespace/deltaTasksStarted
namespace/cache/count
namespace/cache/diskSize
namespace/cache/numEntries
namespace/cache/heapSizeInBytes

config/audit

segment/txn/success
segment/txn/failure

Does someone could indicate what is really relevant to be in the documentation and what could be cumbersome ?

I’m starting a PR on this

Thanks