Status: Shipped. Single-domain support landed in #205 (Phase 3); multi-domain support in #207 (Phase 3b).
Related: MULTI-POOL.md, APP-HOSTING.md, CUTOVER-DEMO-INTO-PROD-SENTINEL.md, PLAN-DEMO-CONTAINER-MIGRATION.md.
Enables: serving multiple base domains (e.g. example.com and example.org) from a single sentinel without per-container alias bookkeeping; one backend hosting workloads under multiple parent domains.
Before this work, every app hostname a pool's Caddy served had to appear in that primary's --public-aliases (MULTI-POOL.md:178). The sentinel's SNI router did exact-match against Primary.Hostname plus the alias list and fell back to the legacy single-backend forwarder on miss.
That worked when app domains were a small fixed set (api.example.com, voice.example.com, …) registered once at primary startup. It broke when:
- Containers create their own subdomains dynamically.
expose_port blogon the demo backend producesblog.example.org. The container exists; the Caddy route exists on the backend; the cert ACMEs fine. But the sentinel didn't know that name → demo-backend, so the SNI peek missed and fell back to the wrong backend. - A single sentinel needs to serve two or more base domains. Prod's sentinel handled
*.example.compurely by the fallback path (single registered backend == prod). Addingexample.orgfor the demo backend meant the fallback could no longer be "the one backend" — different SNI suffixes needed to go to different backends. - A single backend needs to host workloads under multiple parent domains. The lab-hosts-demo pattern: the lab backend serves both
*.lab.example.com(its own pool's workloads) and*.demo.example.org(migrated demo workloads). This is the Phase 4 / B2 path in PLAN-DEMO-CONTAINER-MIGRATION.md.
The exact-alias workaround was doable for fixed app domains but doesn't scale to either dynamic subdomains or multi-tenant backends.
A per-primary list of base domains that the SNI router uses for suffix matching after exact-alias matching fails and before the legacy fallback.
type Primary struct {
Pool Pool `json:"pool"`
Hostname string `json:"hostname"`
Aliases []string `json:"aliases,omitempty"`
BaseDomains []string `json:"base_domains,omitempty"` // suffix-match anchors
IP string `json:"ip"`
Port int `json:"port"`
BackendID string `json:"backend_id,omitempty"`
...
}Lookup method:
// LookupByBaseDomainSuffix returns the primary whose BaseDomains contain
// the longest proper DNS suffix of hostname. A primary may advertise
// multiple base domains; each is considered independently. Returns nil
// when no primary qualifies or when two primaries tie on suffix length
// (ambiguity = misconfiguration; fail closed).
func (r *PrimaryRegistry) LookupByBaseDomainSuffix(hostname string) *PrimarySuffix match means strings.HasSuffix(hostname, "." + bd) for each bd in BaseDomains — proper suffix, so the base domain itself (example.org) does NOT match. This keeps the apex hostname usable as a separate Hostname or Alias for the apex.
buildSNIRoutingHandler in internal/sentinel/manager.go:
- Exact
Hostname/Aliasmatch (LookupByHostname) — unchanged from pre-Phase-3. BaseDomainssuffix match (LookupByBaseDomainSuffix) — Phase 3 (single) / Phase 3b (multi).- Legacy fallback to
fallbackTarget— unchanged.
The fallback stays as a safety net for unpooled single-backend deployments (no BaseDomains configured → step 2 always returns nil → behavior identical to pre-Phase-3).
TunnelHandshake gains PublicBaseDomains []string. Repeatable flag on both daemon and tunnel commands:
containarium daemon ... --public-base-domain lab.example.com --public-base-domain demo.example.org
containarium tunnel ... --public-base-domain lab.example.com --public-base-domain demo.example.org
When unset, the daemon falls back to [--base-domain] (single-element list derived from the existing flag) so single-domain deployments need no extra config.
REST POST /sentinel/primaries body uses base_domains: [...].
- Longest suffix wins. When two base domains both match (e.g.
example.comandlab.example.comfor SNInotebook.lab.example.com), the longer suffix is more specific and wins. Implemented inLookupByBaseDomainSuffixby trackingbestLenacross all(primary, base-domain)pairs. - Equal-length ties fail closed. Two primaries advertising the same base domain is a misconfiguration; the lookup returns nil rather than picking arbitrarily. Caught regardless of which slot the duplicate lives in within each primary's
BaseDomainslist. - Exact alias beats suffix. Operator's explicit
--public-aliasesentry overrides the implicit suffix routing (step 1 runs before step 2 in the SNI router).
| Backend | --pool |
--public-hostname |
--public-base-domain (repeatable) |
|---|---|---|---|
| prod | prod |
prod.example.com |
example.com |
| demo | demo |
demo.example.org |
example.org |
Inbound SNI on the prod sentinel:
| SNI | Match path | Forwards to |
|---|---|---|
prod.example.com |
exact (Hostname) | prod backend |
api.example.com |
exact (Aliases, if listed) | prod backend |
blog.example.org |
suffix (.example.org) |
demo backend |
example.com (apex) |
none → fallback | legacy backend |
The lab backend in the demo container migration plan hosts both its own pool's workloads and migrated demo workloads. One primary registration covers both surfaces:
containarium tunnel \
--pool lab \
--public-hostname lab-primary.example.com \
--public-base-domain lab.example.com \
--public-base-domain demo.example.org \
--public-port 443
| SNI | Match path | Forwards to |
|---|---|---|
notebook.lab.example.com |
suffix (.lab.example.com) |
lab backend |
blog.demo.example.org |
suffix (.demo.example.org) |
lab backend (same!) |
api.example.com |
none — lab.example.com is more specific than example.com, but neither prefix matches api.example.com. Falls through. |
fallback |
The pool tag stays lab — demo.example.org is just the routing surface, not a separate trust boundary. If you want demo containers to also be visible as pool=demo (for containarium list --pool=demo filtering, etc.), use container labels rather than splitting the backend.
| Backend | BaseDomains |
|---|---|
| lab | [example.com, demo.example.org] |
| lab2 | [sub.example.com] |
SNI notebook.sub.example.com matches both lab's example.com (length 11) and lab2's sub.example.com (length 15). Lab2 wins — more-specific suffix is the user's intent.
The Phase 3b --public-base-domain flag makes a primary catch <anything>.<base> via suffix match. But the apex itself (<base> with no leading subdomain) is not a proper suffix of itself, so it falls through to the legacy fallback — that's by design (keeps the apex usable as a separate Hostname or Alias somewhere else if needed).
If you want one backend to also serve the apex of an additional domain — e.g., a lab backend that handles both *.lab.example.com (its own pool space) and *.demo.example.org plus the apex demo.example.org itself — there are two gaps to fill:
- Sentinel routing: add the apex as an alias.
- Caddy on the backend: the daemon's
--base-domainis single-valued and only auto-manages one hostname's TLS subject + reverse-proxy route. The second apex needs a manual Caddy-admin patch.
Assume the lab primary already has --public-base-domain demo.example.org so suffix routing works for subdomains. To additionally serve the apex:
Step 1 — sentinel routing: add --public-aliases demo.example.org to the lab's containarium-tunnel.service ExecStart. After systemctl daemon-reload && systemctl restart containarium-tunnel.service, the sentinel's primary registry shows the alias and exact-matches the apex SNI to lab.
Step 2 — Caddy on lab: patch the local Caddy via its admin API to add the apex to TLS automation subjects (so it ACMEs the cert) and to the existing reverse-proxy route's host array (so requests get forwarded to the daemon):
# add to TLS automation subjects
sudo incus exec containarium-core-caddy -- curl -sX POST \
http://localhost:2019/config/apps/tls/automation/policies/0/subjects \
-H 'Content-Type: application/json' \
-d '"demo.example.org"'
# add to the route's host matchers
sudo incus exec containarium-core-caddy -- curl -sX POST \
http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/0/host \
-H 'Content-Type: application/json' \
-d '"demo.example.org"'After this, https://demo.example.org/ ACMEs on first hit and serves from the lab daemon. Subdomains under *.demo.example.org still flow through the suffix-match path — no change there.
This is operational glue, not a clean abstraction. Two known follow-ups:
- The daemon's
--base-domainshould accept multiple values (or a separate--caddy-aliasesflag) so step 2 doesn't need a manual Caddy patch. Tracked at #213. Until that lands, re-apply the Caddy patch after any daemon restart that fully rebuilds Caddy config (route-sync only updates per-route; the apex patch usually survives). - A backend serving multiple apex hostnames is a one-cluster-multiple-customers pattern. If that's the long-term shape, prefer giving each customer their own
--public-aliasesentry + Caddy subject — don't paper it over by adding everything to one base-domain.
- Identical base domains across pools. Two primaries advertise
BaseDomainscontaining the same string.LookupByBaseDomainSuffixreturns nil. Fail closed; surfaces the misconfig instead of silently picking the wrong backend. - Suffix collides with exact alias on a different primary. Exact wins (step 1 before step 2). Operator's explicit choice overrides implicit routing.
- Tunneled primary disconnects. Same as pre-Phase-3 —
UnregisterByBackendIDremoves the entry; suffix lookups stop matching for those base domains; suffix-matched SNI falls through to the legacy fallback. - PROXY-v2 framing. Suffix-matched destinations go through the same dial-or-yamux path as exact matches, so the existing
m.config.ProxyProtocolhandling still applies. No new code needed. - Per-container base-domain selection. Not a concept — the backend advertises its base domains, and the container publishes a route under whichever fits. The route-builder in
internal/app/proxy.goalready handles "FQDN that doesn't match the daemon's--base-domain→ use as-is," so a lab container exposed asblog.demo.example.org(FQDN) Just Works even if the daemon's--base-domain=lab.example.com.
These are required regardless of the code; listed here so the work isn't a surprise during rollout.
- DNS. Wildcard record for each base domain → sentinel IP. Cloudflare-managed for
example.org, separate provider forexample.com. - ACME on each backend. Caddy on every backend that advertises a given base domain needs creds to mint certs for subdomains under it. HTTP-01 works transparently over sentinel passthrough; DNS-01 requires the relevant provider's API token in the backend's environment.
- GLB cert SAN. Only if the GLB terminates TLS (prod is passthrough, so this is a no-op).
- ACME on the sentinel itself (sentinel doesn't terminate TLS for app traffic; passthrough only).
- Caddy on any backend (each backend continues to own its own
--base-domainfor daemon-driven route naming; this design just makes the sentinel aware of additional suffix anchors). - The
--public-aliasesflag (still works for explicit per-domain routing; suffix match is additive). - Pool-aware container placement (Phase 1 / #204 — that's the input side; this is the output side).
internal/sentinel/base_domain_test.go:
| Test | What it pins |
|---|---|
LookupByBaseDomainSuffix |
basic suffix match, deeper subdomain, base-domain-itself-isn't-a-match, unrelated returns nil, empty input |
LookupByBaseDomainSuffix_LongestWins |
nested base domains (lab.example.com beats example.com for the nested name) |
LookupByBaseDomainSuffix_AmbiguousFailsClosed |
two primaries with the same BaseDomains entry → nil |
LookupByBaseDomainSuffix_EmptyBaseDomainSkipped |
unconfigured primary is invisible to suffix lookup |
ExactHostnameBeatsSuffix |
router precedence at registry level (alias vs suffix) |
LookupByBaseDomainSuffix_MultipleOnOnePrimary |
Phase 3b — single primary advertises multiple base domains; both match |
LookupByBaseDomainSuffix_MultiDomainLosesToMoreSpecific |
Phase 3b — longest-wins still applies across multi-domain primaries |
LookupByBaseDomainSuffix_AmbiguousAcrossMultiDomain |
Phase 3b — same base domain on two primaries fails closed regardless of slot order |
SNIRouting_BaseDomainSuffix |
end-to-end via SNI router harness — suffix match, exact-beats-suffix, fallback |
SNIRouting_MultiBaseDomainOnOneBackend |
Phase 3b — end-to-end of lab-hosts-demo |
RegisterUpdatesBaseDomain |
re-registration replaces the BaseDomains list |
internal/cmd/daemon_base_domains_test.go:
TestResolvePublicBaseDomains— explicit values win, base-domain fallback when unset, empty in both yields nil.
- Push container hostnames to the sentinel on every
expose_port. Workable but couples the route store to a remote API call on every mutation. Adds latency to route creation and a new failure mode (sentinel unreachable → can't expose a port). Suffix match keeps the contract one-way (daemon → sentinel at registration time, never per-route). - Wildcard aliases (
*.example.orgas an alias entry). Same effect but harder to reason about: aliases are also matched againstHostnamefor non-wildcards, and mixing wildcards into a list invites accidental over-matching. A separateBaseDomainsfield makes the intent explicit. - Caddy on the sentinel. Would centralize routing but adds TLS termination on the sentinel (vs. today's passthrough), which breaks per-backend ACME, complicates client IP recovery, and adds another component to keep alive. Not worth it for this use case.
BaseDomain string(single, Phase 3) instead ofBaseDomains []string(Phase 3b). The single-domain version shipped first and was complete for the demo-via-prod-sentinel use case. Multi-domain landed when the Phase 4 plan revealed the need for one backend (the lab) to serve both*.lab.example.comand*.demo.example.org. Migration from singular → plural was a clean field rename across ~10 files with no semantic drift.