Problem
TSDProxy exposes Prometheus metrics via MetricsHandler() (internal/proxymanager/proxymanager.go), but there is no pre-built Grafana dashboard. The existing metrics are sparse — HTTP request counts and latency aren't tracked per-proxy. UDP relay has per-source rate limiting and client tracking but none of that is exported as metrics. Operators have limited visibility into:
- Per-proxy request rates and latency
- Per-proxy error rates and bandwidth
- Active connections per proxy
- UDP relay client counts
- TLS certificate expiry
- Tailscale connection status
- Memory/goroutine pressure from proxy count
Proposed Solution
Part 1: Richer metrics export
Add per-proxy metrics to the existing internal/core/metrics package:
tsdproxy_proxy_requests_total{proxy, port, status} — request count by proxy/port/status code
tsdproxy_proxy_request_duration_seconds{proxy, port} — latency histogram
tsdproxy_proxy_up{proxy} — 1 if healthy, 0 if down
tsdproxy_proxy_connections_active{proxy, port} — active TCP connections
tsdproxy_udp_clients_active{proxy, port} — concurrent UDP clients
tsdproxy_proxy_status{proxy} — current status enum (running/paused/error/etc.)
tsdproxy_cert_expiry_seconds{proxy} — TLS certificate remaining lifetime
Part 2: Pre-built Grafana dashboard JSON
Ship a grafana/dashboard.json in the repo that visualizes:
- Per-proxy health grid
- Request rate and latency heatmap
- Error rate by proxy
- Active connections overview
- UDP client counts per proxy
- Proxy lifecycle events
- Overall system status
Implementation Notes
- The
Metrics struct at internal/core/metrics uses Prometheus client — adding new counters/histograms is straightforward
- The HTTP reverse proxy in internal/proxymanager/port.go already passes
proxyName and portName to m.Middleware()
- Net/http transport-level instrumentation could use
otelhttp (already wired when telemetry is enabled) for additional span-level metrics
- The dashboard JSON should be kept under version control in a
grafana/ directory at repo root
Alternatives
- Prometheus + Grafana is already a common stack; no need for a custom metrics backend
- OpenTelemetry is already supported — metrics could also be exported via OTLP, but Prometheus is the most universally expected
Problem
TSDProxy exposes Prometheus metrics via
MetricsHandler()(internal/proxymanager/proxymanager.go), but there is no pre-built Grafana dashboard. The existing metrics are sparse — HTTP request counts and latency aren't tracked per-proxy. UDP relay has per-source rate limiting and client tracking but none of that is exported as metrics. Operators have limited visibility into:Proposed Solution
Part 1: Richer metrics export
Add per-proxy metrics to the existing internal/core/metrics package:
tsdproxy_proxy_requests_total{proxy, port, status}— request count by proxy/port/status codetsdproxy_proxy_request_duration_seconds{proxy, port}— latency histogramtsdproxy_proxy_up{proxy}— 1 if healthy, 0 if downtsdproxy_proxy_connections_active{proxy, port}— active TCP connectionstsdproxy_udp_clients_active{proxy, port}— concurrent UDP clientstsdproxy_proxy_status{proxy}— current status enum (running/paused/error/etc.)tsdproxy_cert_expiry_seconds{proxy}— TLS certificate remaining lifetimePart 2: Pre-built Grafana dashboard JSON
Ship a
grafana/dashboard.jsonin the repo that visualizes:Implementation Notes
Metricsstruct at internal/core/metrics uses Prometheus client — adding new counters/histograms is straightforwardproxyNameandportNametom.Middleware()otelhttp(already wired when telemetry is enabled) for additional span-level metricsgrafana/directory at repo rootAlternatives