This document specifies the design and implementation requirements for a distributed rate limiting service ("Hivemind") that integrates with Envoy Proxy's global rate limiting API. The service implements a peer-to-peer mesh architecture for state synchronization without relying on centralized storage.
- Language: MUST be implemented in Rust
- Protocol: MUST implement gRPC for Envoy integration
- Architecture: MUST use a decentralized peer mesh for state sharing (no centralized storage)
- Compatibility: MUST support all current Envoy global rate limiting capabilities
- Deployment: MUST be implemented to run as a sidecar alongside the application and Envoy proxy
- Observability: MUST emit and support collection of metrics via OpenTelemetry (OTEL)
- Priority: Lower than core functionality
- Implementation timeline: After core rate limiting features are stable
- High availability through distributed architecture
- Low latency rate limit decisions (sub-millisecond target)
- Horizontal scalability
- Eventual consistency with conflict resolution
- Fault tolerance and partition tolerance
- Leverage existing libraries: SHOULD use well-maintained third-party libraries where possible rather than reimplementing functionality
- Implements Envoy's
RateLimitServiceinterface - Handles incoming rate limit check requests from Envoy proxies
- Returns rate limit decisions based on current state
- Maintains local rate limit counters and windows
- Applies rate limit rules and policies
- Manages time-windowed counters (per-second, per-minute, per-hour, per-day)
- Discovers and maintains connections to peer nodes
- Synchronizes rate limit state across the mesh
- Implements gossip protocol or CRDT-based state replication
- Handles node joining, leaving, and failure detection
- Loads and validates rate limit configuration
- Supports dynamic configuration updates
- Manages rate limit descriptors and actions
┌─────────────────────────────────────────┐
│ Pod / Instance │
│ │
│ ┌──────────────┐ │
│ │ Application │ │
│ └──────────────┘ │
│ │ │
│ │ HTTP/gRPC │
│ ▼ │
│ ┌──────────────┐ │
│ │ Envoy │ │
│ │ Proxy │ │
│ └──────┬───────┘ │
│ │ │
│ │ gRPC (localhost) │
│ ▼ │
│ ┌──────────────┐ │
│ │ Hivemind │ │
│ │ Rate Limit │ │
│ │ Service │ │
│ └──────┬───────┘ │
│ │ │
│ │ Peer Mesh (cross-pod) │
└─────────┼───────────────────────────────┘
│
│ Gossip/CRDT Sync
▼
┌─────────────────────────────────────────┐
│ Other Pod Instances │
│ (Application + Envoy + Hivemind) │
└─────────────────────────────────────────┘
Pod 1 Pod 2 Pod N
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ App + Envoy │ │ App + Envoy │ │ App + Envoy │
│ + │ │ + │ │ + │
│ Hivemind │◄──────►│ Hivemind │◄──────►│ Hivemind │
└──────────────┘ Mesh └──────────────┘ Mesh └──────────────┘
│ │ │
│ Local gRPC │ Local gRPC │ Local gRPC
│ (localhost) │ (localhost) │ (localhost)
▼ ▼ ▼
Envoy Envoy Envoy
The service MUST implement the envoy.service.ratelimit.v3.RateLimitService interface:
service RateLimitService {
rpc ShouldRateLimit(RateLimitRequest) returns (RateLimitResponse);
}Must handle:
- Domain: Identifies the rate limit configuration domain
- Descriptors: Array of rate limit descriptors (key-value pairs)
- Hits addend: Number of hits to add (default: 1)
Must return:
- Overall code: OK, OVER_LIMIT, or UNKNOWN
- Per-descriptor status
- Headers to append to response
- Request headers to append
- Response headers to append
- Dynamic metadata
Must support all Envoy rate limiting features:
-
Descriptor-based rate limiting
- Hierarchical descriptor matching
- Multiple descriptor entries per request
- Descriptor value matching (exact, prefix, suffix)
-
Rate limit actions
- Request headers
- Remote address
- Generic key
- Header value match
- Dynamic metadata
- Metadata
- Extension
-
Time windows
- Second
- Minute
- Hour
- Day
-
Limit responses
- Per-descriptor limits
- Quota bucket ID
- Duration until reset
- Remaining quota
-
Advanced features
- Shadow mode
- Local rate limiting
- Rate limit override
- Custom response headers
- Bootstrap nodes configuration
- mDNS for local network discovery
- DNS-based service discovery
- Static peer configuration
Option A: Gossip Protocol
- Epidemic-style information dissemination
- Periodic state exchange with random peers
- Anti-entropy mechanism for convergence
- Configurable gossip interval and fanout
Option B: CRDT (Conflict-free Replicated Data Types)
- PN-Counter for rate limit counters
- G-Counter for monotonic counters
- OR-Set for descriptor sets
- Automatic conflict resolution
Recommended: Hybrid approach using CRDTs with gossip for propagation
- Eventual consistency
- Read-your-writes consistency per node
- Monotonic reads within time windows
- Bounded staleness (configurable)
The rate limit counter uses a lock-free design with epoch-based time windows. The window
epoch and count are packed into a single atomic u64 value, enabling fully lock-free
operations using compare-and-swap (CAS):
struct RateLimitCounter {
/// Packed state: upper 32 bits = window epoch, lower 32 bits = count
state: AtomicU64,
/// The limit for this counter
limit: u64,
/// Time window for this counter
window: TimeWindow,
/// Fixed reference point for computing window epochs
epoch_start: Instant,
}Design benefits:
- Lock-free reads:
current_count(),remaining()are single atomic loads - Lock-free writes:
increment()uses a CAS loop, only retries on concurrent modification - Cache-friendly: Single cache line for all mutable state
- Automatic window rollover: Epoch computed from elapsed time, no explicit reset needed
Limitation: Maximum count per window is u32::MAX (4,294,967,295). This is sufficient
for most rate limiting use cases.
To minimize lock contention on the underlying gossip state (Chitchat), the cluster module implements a TTL-based read-through cache for distributed counter sums using a sharded concurrent hash map (DashMap):
struct Cluster {
/// Chitchat handle for gossip protocol
handle: ChitchatHandle,
/// Cached distributed counter sums: key -> CachedCount
cached_counts: DashMap<String, CachedCount>,
/// Configuration including cache_ttl
config: ClusterConfig,
}
struct CachedCount {
/// The cached total count across all nodes
total: AtomicU64,
/// When this entry was last refreshed (for TTL expiration)
refreshed_at_nanos: AtomicU64,
}Caching strategy:
- Cache hits (lock-free): DashMap read + atomic load, no Chitchat lock
- Cache misses: Acquire Chitchat lock, sum all nodes, update cache
- Writes: Short lock to update local node state, then refresh cache
TTL configuration:
- Default: 500ms (aligns with
staleness_threshold_ms) - Lower TTL = more accurate distributed counts, more lock contention
- Higher TTL = less accurate counts, better throughput
DashMap benefits:
- Sharded locking (N shards, default = CPU cores × 4)
- Different keys rarely contend for the same shard
- Reads for different shards are fully parallel
struct PeerNode {
node_id: NodeId,
address: SocketAddr,
last_seen: Instant,
state_version: u64,
health: HealthStatus,
}- Graceful degradation during network partitions
- Local decision making with higher error margins
- Partition detection using heartbeats
- Automatic reconciliation on partition healing
- P50: < 1ms for rate limit decisions
- P95: < 5ms for rate limit decisions
- P99: < 10ms for rate limit decisions
- Minimum: 10,000 requests/second per node
- Target: 100,000 requests/second per node
- Support for 100+ peer nodes in mesh
- Horizontal scaling without downtime
- Linear performance scaling with node count
- Memory: O(n) where n = unique descriptor combinations
- CPU: < 50% utilization at 50% target throughput
- Network: Configurable gossip bandwidth limits
Support Envoy's rate limit configuration format:
domain: default
descriptors:
- key: source_cluster
value: cluster_a
rate_limit:
unit: second
requests_per_unit: 100
descriptors:
- key: destination_cluster
value: cluster_b
rate_limit:
unit: minute
requests_per_unit: 1000server:
grpc_port: 8081
metrics_port: 9090
admin_port: 8080
mesh:
node_id: "auto" # or explicit ID
bootstrap_peers:
- "peer1.example.com:7946"
- "peer2.example.com:7946"
gossip_interval_ms: 100
sync_interval_ms: 1000
max_peers: 100
cache_ttl_ms: 500 # TTL for distributed counter cache (default: 500ms)
rate_limiting:
config_path: "/etc/ratelimit/config.yaml"
config_reload_interval: 60s
local_cache_size: 10000
staleness_threshold_ms: 500
observability:
metrics_enabled: true
tracing_enabled: true
log_level: "info"
# OpenTelemetry Configuration
otel:
metrics:
enabled: true
exporter: "otlp" # otlp, prometheus, or both
endpoint: "http://otel-collector:4317"
protocol: "grpc" # grpc or http
interval: 10s
tracing:
enabled: true
exporter: "otlp"
endpoint: "http://otel-collector:4317"
protocol: "grpc"
sample_rate: 1.0 # 0.0 to 1.0
resource:
service_name: "hivemind"
service_version: "auto" # or explicit version
# Prometheus scrape endpoint (compatibility)
prometheus:
enabled: true
port: 9090
path: "/metrics"OpenTelemetry (OTEL) - Primary Method
The service MUST emit metrics via OpenTelemetry, supporting:
- OTLP (OpenTelemetry Protocol) export
- Configurable exporters (OTLP/gRPC, OTLP/HTTP)
- Metric aggregation and batching
- Resource attributes (service name, version, instance ID)
Prometheus Compatibility - Secondary Method
MUST also expose Prometheus-compatible metrics endpoint for backward compatibility.
Required Metrics:
ratelimit.requests.total(Counter): Total rate limit requests- Attributes:
domain,response_code,descriptor_key
- Attributes:
ratelimit.over_limit.total(Counter): Total over-limit responses- Attributes:
domain,descriptor_key
- Attributes:
ratelimit.request.duration(Histogram): Request latency in milliseconds- Attributes:
domain,response_code - Buckets: 0.1, 0.5, 1, 2, 5, 10, 25, 50, 100ms
- Attributes:
ratelimit.counter.value(Gauge): Current counter values (sampled)- Attributes:
domain,descriptor_key,time_window
- Attributes:
ratelimit.mesh.peers.active(Gauge): Number of active peersratelimit.mesh.sync.latency(Histogram): Mesh sync latency in milliseconds- Attributes:
peer_id,operation
- Attributes:
ratelimit.mesh.state_updates.total(Counter): State update events- Attributes:
operation,peer_id
- Attributes:
ratelimit.mesh.bytes_sent.total(Counter): Total bytes sent to peers- Attributes:
peer_id
- Attributes:
ratelimit.mesh.bytes_received.total(Counter): Total bytes received from peers- Attributes:
peer_id
- Attributes:
OpenTelemetry Integration
- MUST support distributed tracing via OpenTelemetry
- OTLP trace export (gRPC and HTTP)
- Trace sampling configuration
- Context propagation via W3C Trace Context
Trace Spans:
ratelimit.check: Rate limit decision span- Attributes: domain, descriptor keys, response code, hit count
ratelimit.state.read: Local state read operationsratelimit.state.write: Local state write operationsratelimit.mesh.sync: Mesh synchronization operations- Attributes: peer count, bytes transferred, duration
ratelimit.mesh.gossip: Gossip protocol operations
- Structured logging (JSON format)
- Configurable log levels
- Rate limit decision logging (optional, for debugging)
- Mesh topology changes
- TLS support for gRPC server
- Mutual TLS (mTLS) for peer mesh communication
- Certificate rotation support
- Optional API key authentication for gRPC
- Peer authentication using certificates
- Configuration access control
Primary Model: Sidecar Deployment
The service MUST be deployed as a sidecar container alongside the application and Envoy proxy. This deployment model provides:
- Locality: Rate limit service runs on localhost, minimizing network latency
- Isolation: Each application instance has its own rate limit service instance
- Simplicity: No separate infrastructure required
- Scalability: Rate limit capacity scales automatically with application instances
- Resource Sharing: Shares pod/instance resources with application
apiVersion: v1
kind: Pod
metadata:
name: app-with-ratelimit
spec:
containers:
- name: application
image: myapp:latest
ports:
- containerPort: 8080
- name: envoy
image: envoyproxy/envoy:v1.28-latest
ports:
- containerPort: 10000
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy
- name: hivemind
image: hivemind:latest
ports:
- containerPort: 8081 # gRPC for Envoy
- containerPort: 7946 # Peer mesh
- containerPort: 9090 # Metrics
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespaceversion: '3.8'
services:
app:
image: myapp:latest
envoy:
image: envoyproxy/envoy:v1.28-latest
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
depends_on:
- hivemind
hivemind:
image: hivemind:latest
environment:
- MESH_NODE_ID=node1
ports:
- "8081:8081" # gRPC
- "7946:7946" # Peer meshFor non-containerized deployments, run as a systemd service on the same host as Envoy.
Intra-Pod Communication (Envoy ↔ Hivemind)
- Communication over localhost/127.0.0.1
- No encryption required (same network namespace)
- Ultra-low latency (microseconds)
- Use Unix domain sockets for even lower latency (optional)
Inter-Pod Communication (Hivemind ↔ Hivemind)
- Peer mesh communication across pod network
- MUST use mTLS for security
- Service discovery via Kubernetes DNS or headless service
- UDP for gossip protocol (with optional DTLS)
- TCP/QUIC for bulk state transfer
Minimum (per sidecar)
- CPU: 100m (0.1 cores)
- Memory: 128Mi
Recommended (per sidecar)
- CPU: 250m - 500m (0.25 - 0.5 cores)
- Memory: 256Mi - 512Mi
High-Traffic Workloads
- CPU: 1000m - 2000m (1-2 cores)
- Memory: 1Gi - 2Gi
For peer mesh formation in Kubernetes:
- Headless Service: Create a headless service to expose all pod IPs
apiVersion: v1
kind: Service
metadata:
name: hivemind-mesh
spec:
clusterIP: None
selector:
app: myapp
ports:
- name: mesh
port: 7946- DNS Discovery: Peers discover each other via DNS SRV records
- Pod Labels: Use labels for selective mesh membership
- StatefulSet Support: Optional for stable network identities
- No single point of failure
- Automatic peer failover
- Split-brain prevention
- Rolling updates without downtime
- Rate limit logic
- CRDT operations
- Configuration parsing
- gRPC API compatibility with Envoy
- Peer mesh synchronization
- Failure scenarios
- Load testing at target throughput
- Latency benchmarking
- Scalability testing with varying peer counts
- Network partition scenarios
- Node failure and recovery
- Clock skew handling
The implementation SHOULD prefer using well-established, actively maintained third-party libraries over custom implementations where appropriate. This approach:
- Reduces development time and maintenance burden
- Leverages battle-tested code from the Rust ecosystem
- Benefits from community security audits and bug fixes
- Allows focus on core business logic and unique features
Guidelines for library selection:
- Prefer libraries with active maintenance and community support
- Evaluate security track record and audit status
- Consider performance characteristics and benchmarks
- Assess API stability and breaking change history
- Check licensing compatibility (prefer Apache-2.0, MIT, BSD)
Custom implementation should be considered when:
- No suitable library exists for the specific use case
- Existing libraries have unacceptable performance characteristics
- Dependencies would introduce significant complexity or bloat
- Core functionality requires tight integration with custom logic
- tonic: gRPC implementation
- tokio: Async runtime
- serde: Serialization
- rustls: TLS implementation
- config: Configuration management
- opentelemetry: OpenTelemetry API and SDK
- opentelemetry-otlp: OTLP exporter for metrics and traces
- opentelemetry-prometheus: Prometheus exporter (compatibility)
- tracing: Structured logging and tracing
- tracing-opentelemetry: Bridge between tracing and OpenTelemetry
- tracing-subscriber: Tracing subscriber implementations
- chitchat: Gossip protocol implementation for cluster membership and state sync
- dashmap: Sharded concurrent hash map for lock-free caching
- Project setup and structure
- gRPC server implementation
- Basic rate limiting logic
- Configuration loading
- Unit tests
- CRDT implementation for counters
- Gossip protocol implementation
- Peer discovery and management
- State synchronization
- Integration tests
- Basic metrics (Prometheus endpoint)
- Security (TLS/mTLS)
- Performance optimization
- Documentation
- Load testing
- OpenTelemetry metrics and tracing (OTEL integration)
- Dynamic configuration updates
- Advanced Envoy features
- Admin API
- Deployment tooling
- Envoy Rate Limit Service
- Envoy Global Rate Limiting
- CRDTs: Consistency without concurrency control
- Gossip Protocols
- CRDT: Conflict-free Replicated Data Type
- Descriptor: Key-value pair used for rate limit matching
- Mesh: Peer-to-peer network topology
- Vector Clock: Logical clock for tracking causality in distributed systems
- Window: Time period for rate limit enforcement (second, minute, hour, day)