Bare-metal infrastructure layer for Open Platform. Sets up NixOS + K3s + infrastructure services so Open Platform can deploy into its own vCluster with infrastructure.mode=external.
| Layer | Components | Purpose |
|---|---|---|
| Hardware | NixOS, ZFS, 10G bonding, GPU passthrough | Bare-metal foundation |
| Kubernetes | K3s (HA etcd, multi-node, no flannel, no kube-proxy) | Container orchestration |
| CNI | Cilium (native routing, eBPF, kube-proxy replacement, Hubble) | Pod networking + service LB on a single eBPF data plane |
| Networking | Traefik, MetalLB (FRR mode, L2 + BGP), VLAN segmentation | Ingress, load balancing, isolation |
| Storage | CNPG operator, ZFS pools, MinIO | PostgreSQL, block storage, object storage |
| GitOps | Flux (multi-source) | Infrastructure reconciliation |
| Virtualization | KubeVirt, vCluster operator | VM lifecycle, tenant isolation |
| Security | cert-manager, Let's Encrypt | TLS certificate automation |
| DNS | CoreDNS | Internal + external resolution |
In addition to core infrastructure, this repo provides shared AI and data services consumed by all tenants:
| Service | Namespace | Port | Purpose |
|---|---|---|---|
| LiteLLM | litellm |
4000 | Unified AI gateway (Claude, Ollama, OpenAI behind one API) |
| ChromaDB | chromadb |
8000 | Vector database for semantic search and embeddings |
| Ollama | ollama |
11434 | Local LLM inference on GPU nodes |
| MCP Registry | mcp-registry |
3000 | Dynamic MCP server discovery and client config generation |
The MCP Registry provides automatic discovery for Model Context Protocol servers running anywhere in the cluster. MCP servers register via:
- K8s labels (automatic) — add
mcp.server/enabled: "true"to any Service - HTTP API (manual) —
POST /api/v1/register
Clients get auto-generated config:
# Generate Claude Code MCP settings
curl http://mcp-registry.mcp-registry.svc:3000/api/v1/config/claude-code- Open Platform services (Forgejo, Woodpecker, etc.) — OP deploys itself
- Application workloads — managed by OP's CI/CD
- Tenant vClusters — managed by OP's provisioner
Host Cluster (this repo)
├── Infrastructure (always running)
│ ├── Cilium → eBPF CNI + kube-proxy replacement + Hubble
│ ├── Traefik → routes *.domain for all tenants
│ ├── MetalLB (FRR) → assigns VIPs from edge VLAN (L2 or BGP)
│ ├── cert-manager → Let's Encrypt certificates
│ ├── CNPG operator → PostgreSQL lifecycle
│ ├── Flux → multi-source GitOps
│ ├── vCluster → tenant isolation
│ ├── CoreDNS → DNS resolution
│ └── KubeVirt → Windows VM lifecycle
│
├── vCluster: open-platform (deployed by OP)
│ └── Forgejo, Woodpecker, Headlamp, MinIO, Apps
│
├── vCluster: ai-services (optional)
│ └── ChromaDB, Ollama, data pipelines
│
└── vCluster: tenant-N (optional, via OP provisioner)
└── Customer workloads
cp site-config.nix.example site-config.nix
# Edit with your IPs, VLANs, node names, SSH keys# Boot NixOS minimal ISO via iDRAC/IPMI
# See INSTALL.md for step-by-step
nixos-install --no-root-passwd
reboot# PXE boot (automatic) or manual install
# Each node gets its own node-config.nix# On any control-plane node:
kubectl apply -k infrastructure/
# Or let Flux manage it (recommended):
flux bootstrap git --url=https://your-git/infra --path=infrastructure# In the open-platform repo:
# open-platform.yaml:
# infrastructure:
# mode: external
make deployInternet
│
▼
Firewall (L3 gateway, all inter-VLAN routing)
│
├── VLAN 100 (10.0.0.0/20) K8s Node Network (1G management)
├── VLAN 101 (10.0.16.0/20) K8s Edge / MetalLB VIPs
├── VLAN 102 (10.0.32.0/20) Storage Fabric (10G east-west, no firewall)
├── VLAN 103 (10.0.48.0/20) Tunnel Endpoints (WireGuard)
├── VLAN 105 (10.0.64.0/20) Bare-Metal Provisioning (PXE)
└── VLAN 900 (10.0.254.0/24) Live Migration (10G, no firewall)
Internet → Public IP → Firewall DNAT → MetalLB VIP (VLAN 101)
→ connmark routing → Traefik → vCluster Ingress → service
Response → connmark restore → VLAN 101 → Firewall → Internet
Each node is configured via a node-config.nix file:
{
hostname = "node-1";
hostId = "abc12345";
primaryInterface = "eno1";
primaryAddress = "10.0.0.10/20";
# 10G storage fabric (optional)
enable10g = true;
sfpInterfaces = [ "enp5s0f0" "enp5s0f1" ];
storageAddress = "10.0.32.10/20";
# K3s role
k3sRole = "server"; # "server" (control-plane + etcd) or "agent"
# Features
enableEdge = true; # MetalLB + connmark routing
enableKvm = false; # KubeVirt / libvirt
enableGpu = false; # NVIDIA passthrough
}See hosts/worker/node-config.nix for the full template with all options.
| Doc | Purpose |
|---|---|
| INSTALL.md | Step-by-step bare-metal installation |
| NETWORK-DESIGN.md | VLAN architecture and traffic flows |
| LESSONS-LEARNED.md | Production gotchas and fixes |
| docs/open-platform-deploy.md | Deploying OP with external infrastructure |
MIT