Skip to content

Allow overriding discovered CoreDNS exposed IPs for bare-metal clusters behind static NAT #2360

Description

@dulaj-me

Problem

k8gb's IP discovery assumes the CoreDNS Service / Ingress LB status reflects publicly routable IPs. On bare-metal clusters behind 1:1 static NAT — a common colo / on-prem topology — this assumption breaks:

  • The cluster's CoreDNS Service (or fallback Ingress with k8gb.io/ip-source=true) only ever sees the private node IPs assigned by the cluster's L4 LB (e.g. klipper-lb on k3s, MetalLB in L2 mode, kube-vip).
  • The colo perimeter does 1:1 SNAT/DNAT between private node IPs and a set of public IPs. The cluster itself has no awareness of those public IPs.
  • k8gb publishes the private IPs as the NS glue records (gslb-ns-<geoTag>-<loadBalancedZone> A records in EdgeDNS) and as localtargets-<host> records inside each Gslb's DNSEndpoint.
  • External resolvers follow the NS delegation, hit the private IPs in the glue, and time out.

Reproduction

  1. Stand up any bare-metal-style k8s with a node-IP-based LoadBalancer controller (k3d, kind, k3s with klipper-lb).
  2. Install k8gb with coredns.serviceType=LoadBalancer and any EdgeDNS provider (Cloudflare / Route53).
  3. Configure 1:1 NAT between node IPs and public IPs at the network perimeter (or simulate by treating node IPs as "private").
  4. Observe gslb-ns-<geoTag>-<loadBalancedZone> A records in EdgeDNS contain node IPs.

Why existing knobs don't solve it

Approach Why it doesn't work on bare-metal NAT
coredns.serviceType: LoadBalancer reading Status.LoadBalancer.Ingress[].IP Bare-metal LBs assign private node IPs to that field
Status.LoadBalancer.Ingress[].Hostname (FQDN lookup path in extractIPFromLB) klipper-lb doesn't populate Hostname; no annotation to inject one
Ingress fallback with k8gb.io/ip-source=true Same root cause — bare-metal Ingress status only has private IPs
Per-Gslb k8gb.io/exposed-ip-addresses annotation Only overrides the per-Gslb localtargets / A records; does NOT fix the NS glue records published to EdgeDNS

The last point matters: even if every Gslb annotates itself with public IPs, the NS delegation in EdgeDNS still points to private IPs, so the whole chain is broken before resolvers ever reach the cluster's CoreDNS.

Proposed enhancement

Add a cluster-level override:

  • New chart value k8gb.edgeDNSPublicIPs ([]string, default empty)
  • New env var EDGE_DNS_PUBLIC_IPS (comma-separated)
  • When set, k8gb uses these IPs in place of the discovered Service / Ingress IPs at the zone-delegation boundary so both the EdgeDNS NS glue records AND per-Gslb localtargets-* / final A records use the override.

Suggested insertion point

controllers/zones/wrapper.go ZoneDelegationWrapper.GetDetail(). This covers both ExternalDNS and Infoblox providers, since both consume ExtendedZoneDelegation.LocalCoreDNSExposedIPs. Doing it here keeps the IPResolver unchanged and avoids special-casing provider code.

Alternatives considered

  • Per-Gslb annotation only — partial fix; doesn't reach EdgeDNS glue records.
  • External tooling to rewrite records post-publish — fights k8gb's TXT-registry ownership and creates reconcile churn.
  • Custom CoreDNS Service hostname — requires the bare-metal LB to support hostname annotations; klipper-lb doesn't, and patching this in-cluster is brittle.

Use case

Bare-metal / colo BYOC clusters where the cluster sees only RFC1918 IPs but is exposed via static 1:1 NAT on the perimeter. Common in datacenter deployments where MetalLB / Cilium LB-IPAM with a public CIDR isn't an option (e.g. Cilium migration in progress, or perimeter team owns the NAT).

I have a working patch and can open a PR if the approach is acceptable.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions