P6-04+05: Prometheus /metrics endpoint + Grafana dashboard
CI / Test (rest) (pull_request) Successful in 41s
CI / Test (store) (pull_request) Successful in 43s
CI / Lint (pull_request) Successful in 29s
CI / Build (windows/amd64) (pull_request) Successful in 44s
CI / Test (server-http) (pull_request) Successful in 1m47s
CI / Build (linux/arm64) (pull_request) Successful in 43s
CI / Build (linux/amd64) (pull_request) Successful in 2m1s
CI / Test (rest) (pull_request) Successful in 41s
CI / Test (store) (pull_request) Successful in 43s
CI / Lint (pull_request) Successful in 29s
CI / Build (windows/amd64) (pull_request) Successful in 44s
CI / Test (server-http) (pull_request) Successful in 1m47s
CI / Build (linux/arm64) (pull_request) Successful in 43s
CI / Build (linux/amd64) (pull_request) Successful in 2m1s
New internal/server/metrics package emits the legacy text/plain exposition format directly, so we don't pull in prometheus/client_golang. Endpoint is opt-in via RM_METRICS_TOKEN and/or RM_METRICS_TRUSTED_CIDR; route is not mounted at all if neither gate is set. Both gates ANDed when both configured. Per-host gauges (online, last_backup_*, repo_size_bytes, snapshot_count, open_alerts, repo_status), server gauges (hosts_total/online, active_alerts by severity, build_info), and an in-memory job-duration histogram observed from the existing MsgJobFinished branch in the WS handler. Docs in docs/prometheus.md (enable + scrape config + metric reference + dashboard import). Sample dashboard at deploy/grafana/restic-manager-dashboard.json - six panels, Grafana schema 39, single Prometheus datasource variable. Tests: golden render, concurrent observe, bucket boundaries in the metrics package; auth matrix (no auth -> 404, token gate, CIDR gate, both required) in the HTTP layer.
This commit is contained in:
@@ -20,6 +20,7 @@ import (
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/fleetupdate"
|
||||
rmhttp "gitea.dcglab.co.uk/steve/restic-manager/internal/server/http"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/maintenance"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/metrics"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ws"
|
||||
@@ -89,6 +90,7 @@ func run() error {
|
||||
|
||||
hub := ws.NewHub()
|
||||
jobHub := ws.NewJobHub()
|
||||
metricsRegistry := metrics.NewRegistry()
|
||||
|
||||
notifHub := notification.NewHub(st, aead, cfg.BaseURL)
|
||||
alertEngine := alert.NewEngine(st, notifHub)
|
||||
@@ -122,6 +124,7 @@ func run() error {
|
||||
UI: renderer,
|
||||
Version: version,
|
||||
OIDC: oidcClient,
|
||||
Metrics: metricsRegistry,
|
||||
}
|
||||
|
||||
// First-run bootstrap: if the users table is empty, mint a one-time
|
||||
|
||||
Reference in New Issue
Block a user