Skip to content

feat: Grafana dashboard + richer per-proxy metrics #435

@almeidapaulopt

Description

@almeidapaulopt

Problem

TSDProxy exposes Prometheus metrics via MetricsHandler() (internal/proxymanager/proxymanager.go), but there is no pre-built Grafana dashboard. The existing metrics are sparse — HTTP request counts and latency aren't tracked per-proxy. UDP relay has per-source rate limiting and client tracking but none of that is exported as metrics. Operators have limited visibility into:

  • Per-proxy request rates and latency
  • Per-proxy error rates and bandwidth
  • Active connections per proxy
  • UDP relay client counts
  • TLS certificate expiry
  • Tailscale connection status
  • Memory/goroutine pressure from proxy count

Proposed Solution

Part 1: Richer metrics export

Add per-proxy metrics to the existing internal/core/metrics package:

  • tsdproxy_proxy_requests_total{proxy, port, status} — request count by proxy/port/status code
  • tsdproxy_proxy_request_duration_seconds{proxy, port} — latency histogram
  • tsdproxy_proxy_up{proxy} — 1 if healthy, 0 if down
  • tsdproxy_proxy_connections_active{proxy, port} — active TCP connections
  • tsdproxy_udp_clients_active{proxy, port} — concurrent UDP clients
  • tsdproxy_proxy_status{proxy} — current status enum (running/paused/error/etc.)
  • tsdproxy_cert_expiry_seconds{proxy} — TLS certificate remaining lifetime

Part 2: Pre-built Grafana dashboard JSON

Ship a grafana/dashboard.json in the repo that visualizes:

  • Per-proxy health grid
  • Request rate and latency heatmap
  • Error rate by proxy
  • Active connections overview
  • UDP client counts per proxy
  • Proxy lifecycle events
  • Overall system status

Implementation Notes

  • The Metrics struct at internal/core/metrics uses Prometheus client — adding new counters/histograms is straightforward
  • The HTTP reverse proxy in internal/proxymanager/port.go already passes proxyName and portName to m.Middleware()
  • Net/http transport-level instrumentation could use otelhttp (already wired when telemetry is enabled) for additional span-level metrics
  • The dashboard JSON should be kept under version control in a grafana/ directory at repo root

Alternatives

  • Prometheus + Grafana is already a common stack; no need for a custom metrics backend
  • OpenTelemetry is already supported — metrics could also be exported via OTLP, but Prometheus is the most universally expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions