Skip to content

feat(provider): add read-only Kubernetes provider#102

Open
bussyjd wants to merge 2 commits into
0xff-ai:mainfrom
bussyjd:feat/kubernetes-provider
Open

feat(provider): add read-only Kubernetes provider#102
bussyjd wants to merge 2 commits into
0xff-ai:mainfrom
bussyjd:feat/kubernetes-provider

Conversation

@bussyjd

@bussyjd bussyjd commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Adds omnifs-provider-kubernetes: a read-only WASM provider that projects a
Kubernetes cluster as a browsable filesystem. Resource types — including CRDs —
are discovered live from the API server, so the tree reflects whatever the
cluster actually serves, at the versions it serves.

/namespaces/<ns>/<type>/<name>/{manifest.yaml,manifest.json,status.yaml,events.txt}
/namespaces/<ns>/pods/<name>/logs/<container>.log
/cluster/<type>/<name>/{manifest.yaml,manifest.json,status.yaml}

Standard tooling works on every leaf: cat, grep -r, find, diff, tar,
tail.

Transport & key design decisions

  • Transport-agnostic over the host callout. Every request is
    cx.http().get(HttpEndpoint::build_url(...)). The recommended endpoint is a
    local kubectl proxy --unix-socket — the same unix: callout transport the
    Docker provider uses. kubectl terminates TLS and injects the active context's
    credentials, so the provider issues plain HTTP, never handles a token, and
    works against any cluster kubectl can reach (mTLS / EKS-GKE exec plugins /
    OIDC / custom CA all handled upstream). An https:// API server with
    system-trust TLS + bearer token also works. No host changes — the host
    grants the socket automatically from config.endpoint
    (materialize_runtime_capabilities).
  • One mount = one pinned cluster/context. The FS does not change when you
    kubectl config use-context; to browse another cluster, add another mount.
    This respects the per-mount credential binding and the read-only read model
    (a writable "switch context" control file was rejected as it violates both).
  • Resource-as-directory, so status, events, and pod logs have a home
    and the manifest leaf keeps an honest stat/wc -c.
  • Optional hide_empty_types: list only types that currently have instances
    (batched limit=1 probes via the SDK's join_all); empty types stay
    navigable via lookup.

How correctness was ensured (methodology)

This was built grounded-first and verified adversarially, not from memory:

  1. Research + codebase analysis (multi-agent workflow). Fetched current
    Kubernetes API/kubeconfig docs and analyzed the omnifs SDK router, host
    callout/capability path, caching model, and auth — producing a grounded
    design with the genuine open decisions (context handling, transport, resource
    representation) surfaced as options.

  2. Independently verified the feasibility crux by reading the host code: the
    shared HTTPS client has no custom-CA/mTLS and the capability checker denies
    private IPs — which is why the kubectl proxy unix-socket transport was
    chosen (it sidesteps both with zero host changes), rather than assuming a
    native HTTPS client would work.

  3. Implemented against the real SDK contract — every router/projection/
    capture/HTTP API was matched to the actual source (db/docker/github
    providers as templates), not guessed.

  4. Adversarial correctness review (4 dimensions — routing/listing,
    discovery/HTTP, manifest/host-load, SDK-contract — each finding independently
    verified): 0 confirmed bugs.

  5. kubectl / client-go parity validation. Cloned kubernetes/client-go and
    kubernetes/kubectl (current) and ran an 18-agent match-matrix with
    per-finding verification against the upstream source and its tests. This
    confirmed three real divergences, which are fixed here, and caught one
    false positive (an agent claimed kubectl get doesn't strip
    managedFields; verified that NewGetPrintFlags composes cli-runtime's
    default --show-managed-fields=false path — so the strip is correct and was
    kept).

    Fixes derived directly from upstream contracts:

    Behavior Upstream reference Result
    Event field selector event_expansion.go GetFieldSelector → name+namespace+kind+uid was name-only → now full selector (no cross-kind / prior-incarnation leakage)
    Object cleaning kubectl get strips only managedFields, keeps last-applied was stripping both → now managedFields only
    Discovery versions client-go ServerPreferredResources (all versions, preferred wins) was preferred-only → now all versions, preferred-first (CRDs served only in a non-preferred version now surface)

Validated

  • cargo test — 15 unit tests: a route-seal test (proves the literal-pods
    logs routes coexist with the {rtype} capture without ambiguity — otherwise
    only caught at runtime), capture parsing/traversal guards, discovery
    scope/subresource/collision/multi-version dedup, the event field selector, and
    event rendering.
  • cargo clippy --target wasm32-wasip2 -- -D warnings — clean.
  • Release component build (wasm32-wasip2).

Out of scope / follow-ups (documented in the README)

  • No live watch — reads are point-in-time; re-cat re-fetches. Polling-based
    invalidation (periodic LIST keyed by resourceVersion → cache invalidation,
    via the runtime refresh-interval/timer-tick) is the natural follow-up.
  • Pod logs are a current snapshot per container (= kubectl logs <pod> -c);
    follow/--previous/--timestamps need the ranged/volatile file path.
  • describe.txt omitted (a faithful per-kind describe renderer is large;
    manifest/status/events cover the same data).
  • Listings issue a single unpaginated LIST (returns the full collection —
    no silent truncation); chunked listing (limit/continue) is a follow-up for
    very large namespaces.
  • Live-cluster integration testing across versions (omnifs dev +
    kubectl proxy against kind clusters at several minor versions) is not
    expressible as a unit test and remains the integration follow-up.

A WASM provider that projects a Kubernetes cluster as a browsable,
read-only filesystem. Resource types (including CRDs) are discovered live
from the API server, so the tree reflects whatever the cluster serves.

Layout (resource-as-directory):
  /namespaces/<ns>/<type>/<name>/{manifest.yaml,manifest.json,status.yaml,events.txt}
  /namespaces/<ns>/pods/<name>/logs/<container>.log
  /cluster/<type>/<name>/{manifest.yaml,manifest.json,status.yaml}

Transport: the provider is transport-agnostic over the host callout. The
recommended endpoint is a local `kubectl proxy --unix-socket` (the same
`unix:` callout transport the Docker provider uses): kubectl terminates TLS
and injects the active-context credentials, so the provider issues plain
HTTP, never handles a token, and works against any cluster kubectl can
reach (mTLS, EKS/GKE exec plugins, OIDC, custom CA all handled upstream). An
`https://` API server with system-trust TLS + bearer token also works. The
host grants the socket automatically from `config.endpoint`; no host changes.

Design decisions: one mount = one pinned cluster/context (the FS does not
change on `kubectl config use-context`; add a mount per cluster), matching
the per-mount credential model and read-only read model.

Correctness was validated against the upstream kubectl/client-go source and
tests; the contract-derived behaviors:
- Discovery walks /api/v1 + every group, querying each group's versions
  preferred-first (matches client-go ServerPreferredResources): a
  multi-version resource resolves to its preferred version while a resource
  present only in a non-preferred version still surfaces.
- events.txt filters by involvedObject.{name,namespace,kind,uid} (matches
  event_expansion.go GetFieldSelector), so a same-named object of another
  kind, or a prior incarnation, doesn't leak events.
- manifest.{yaml,json} strip only metadata.managedFields (as `kubectl get`
  does by default since v1.21) and preserve the last-applied-configuration
  annotation, matching `kubectl get` output.
- Plural collisions across groups disambiguate to <plural>.<group>.

Optional `hide_empty_types` config: list only resource types with at least
one instance (batched limit=1 probes); empty types stay navigable via lookup.

Validated: cargo nextest/test (15 unit tests incl. a route-seal test and
kubectl-parity selector/discovery tests), wasm32-wasip2 clippy -D warnings,
release component build. Live-cluster validation across versions
(omnifs dev + kubectl proxy against kind) remains the integration follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bussyjd

bussyjd commented Jun 9, 2026

Copy link
Copy Markdown
Author

Tests & validation

Unit tests (15 — inline #[cfg(test)])

The PR does include tests; they're inline #[cfg(test)] mod tests (12 in src/api.rs, 3 in src/lib.rs), run by cargo test, not a separate tests/ dir — which is why they may have looked absent. Inline is the only convention that fits: providers compile to wasm32-wasip2 and can't execute on the host test harness (per AGENTS.md, WASM tests build but only --no-run), so host-runnable tests live inline. Every provider in the repo that has tests does it this way; for context this is currently the most-tested provider here (others range 0–5).

They guard behavior that would otherwise fail only at runtime or silently diverge from kubectl — not plumbing:

  • routes_seal_without_ambiguity — the generated provider calls router.seal() on first use, so route ambiguity is otherwise a runtime failure. This proves the literal pods logs routes coexist with the {rtype} capture without overlap.
  • Discovery — scope classification, subresource (name containing /) filtering, group-qualified plural collisions, and multi-version preferred-first dedup (a resource in several versions resolves to the preferred one; a resource present only in a non-preferred version still surfaces).
  • event_field_selector_matches_kubectl_fields — the involvedObject.{name,namespace,kind,uid} selector matching client-go's GetFieldSelector.
  • Capture parsing — path-traversal / separator rejection; <container>.log stem extraction.
  • Manifest cleaningmanagedFields stripped, last-applied-configuration kept (matching kubectl get); panic-safety on malformed shapes.
  • Rendering — event table formatting, .status extraction, container enumeration order.

How correctness was established

Wire behaviors were validated against the upstream kubernetes/client-go + kubernetes/kubectl source and their tests (not from memory). That produced the three fixes in this PR: the full event field selector, managedFields-only cleaning (kubectl get keeps last-applied), and all-versions/preferred-first discovery.

Live cluster run

Validated against a live k3s v1.35.1 cluster (k3d) by pointing the provider's transport — a read-only kubectl proxy --unix-socket — at it and issuing the provider's exact requests. (curl --unix-socket mirrors how the host turns the provider's unix://<hex>/path URLs into socket calls.)

Check Result
Transport to a loopback + admin client-cert API server ✅ reached via the proxy — the case native HTTPS can't do
Discovery /api/v1 + /apis + per-group ✅ parses (39 core resources, 28 groups)
Real multi-version groups (autoscaling, gateway.networking.k8s.io, monitoring.coreos.com) ✅ confirms the preferred-first fix is load-bearing, not theoretical
Namespace + collection listings ✅ resolve to names
Object GET carries managedFields and last-applied ✅ validates both cleaning decisions on a real object
StatefulSet manifest + status
Pod logs …/pods/<p>/log?container=…&tailLines=…
hide_empty_types limit=1 probe (empty → 0, non-empty → 1)
Event selector accepted (cluster had 0 live events; returns cleanly)

The cluster was only ever read; the proxy was torn down afterward.

Not yet covered: the host↔WASM FUSE mount end-to-end (omnifs dev), which needs Kubernetes added to the dev built-ins + the proxy socket bind-mounted into the dev container — tracked as the integration follow-up.

How to actually use it

  1. Run a read-only proxy against your chosen context. kubectl handles TLS / mTLS / exec plugins / OIDC / context selection, so the provider never sees a token:
    kubectl proxy --unix-socket=/run/omnifs/k8s.sock \
      --reject-methods='POST,PUT,PATCH,DELETE'
  2. Mount config — the host grants the socket automatically from config.endpoint (no separate capabilities.unix_sockets entry needed):
    {
      "provider": "omnifs_provider_kubernetes.wasm",
      "mount": "k8s",
      "config": { "endpoint": "unix:///run/omnifs/k8s.sock", "hide_empty_types": false }
    }
  3. Browse with ordinary tools:
    cat  /omnifs/k8s/namespaces/<ns>/deployments/<name>/manifest.yaml
    cat  /omnifs/k8s/namespaces/<ns>/deployments/<name>/status.yaml
    tail /omnifs/k8s/namespaces/<ns>/pods/<pod>/logs/<container>.log
    cat  /omnifs/k8s/cluster/nodes/<node>/manifest.yaml
    grep -rh 'image:' /omnifs/k8s/namespaces/<ns>/deployments
  • /namespaces/<ns> lists every namespaced type (CRDs included via live discovery); /cluster the cluster-scoped ones. Set hide_empty_types: true to list only types that currently have instances.
  • One mount = one pinned cluster/context; the FS does not change on kubectl config use-context — add a mount per cluster.
  • An https:// endpoint also works for clusters reachable with system-trust TLS + a bearer token in mount auth; the unix:// + kubectl proxy path is recommended and the only one that reaches local/mTLS/exec-plugin clusters today.

Full details, FS layout, and limitations are in providers/kubernetes/README.md.

@raulk

raulk commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Oh, hey! Thanks for the contribution ❤️ Taking a look

Findings from an adversarial review plus a full FUSE end-to-end run
against a live k3s cluster (kubectl proxy over a unix socket inside the
dev container):

- Pod logs 406'd through kubectl proxy: the apiserver's content
  negotiation rejects `Accept: text/plain` on the log subresource even
  though it streams text. Send `Accept: */*` (what curl/kubectl
  effectively send). Found only by the live FUSE run; fixed and
  re-verified (app/init/sidecar logs, tail, wc).
- Live listings are now `open` (non-exhaustive) instead of `exhaustive`,
  so every readdir re-lists from the API instead of freezing the first
  enumeration in the host's no-TTL dirent cache. Verified live: a pod
  created after the first `ls` appears on the next one. This also keeps
  the hide_empty_types contract honest (hidden types stay resolvable).
- Root discovery failures (/api/v1, /apis) now propagate instead of
  being swallowed: a transient error there must not be cached for the
  session as a half-empty catalog (which could also invert bare-plural
  collision naming). Per-group-version failures are still skipped, and
  all group-version fetches run in one batched callout round.
- Path segments reject URL metacharacters (`%`, `?`, `#`, control
  chars): `cat 'pods/x?watch=true/...'` used to smuggle a query through
  the raw URL path (holding a watch open); now ENOENT. `%` is forbidden
  by Kubernetes' own ValidatePathSegmentName, so nothing legal is lost;
  RBAC names with `:` keep working (verified live).
- Removed the five DirIntent::Lookup existence-check branches: the
  router resolves capture-dir lookups statically before handlers run,
  so they were dead code (and the only unvalidated URL interpolation).
- Object reads project their siblings: manifest.yaml/manifest.json/
  status.yaml render from one GET and preload the other two (<=64 KiB
  inline cap); events.txt preloads all three from the object it already
  fetches for the uid.
- Event field-selector values escape `\` `,` `=` like kubectl's
  fields.EscapeValue; recurring events.k8s.io events read
  series{count,lastObservedTime} like kubectl's printer.
- Dropped the dead Rc<RefCell<...>> around the discovery cache in favor
  of cx.state_mut.
- README/manifest: corrected the https-transport claim (no bearer-token
  injection exists in v1), documented that the proxy socket must be
  reachable inside the runtime container, the always-visible pods
  scaffolding entry under hide_empty_types, the YAML 1.1 quoting
  divergence from kubectl, and unbounded pod-log reads.

Live verification: manifest.json is byte-identical (jq -S) to
kubectl get -o json against k3s v1.34.1; CRDs surface automatically;
grep -r/find/du/wc/stat/md5sum/cp/diff/tar all behave through FUSE;
nonexistent objects/types and metachar names return ENOENT.
@bussyjd

bussyjd commented Jun 9, 2026

Copy link
Copy Markdown
Author

Final review pass: live FUSE end-to-end + adversarial review → 68918af

This PR got a last full review: three independent review passes over the diff (SDK-contract, Kubernetes-semantics, repo-conventions) plus — for the first time — a complete FUSE end-to-end run against a live cluster: a disposable k3s v1.34.1 container, kubectl proxy --unix-socket running inside the omnifs dev container on the exact socket the mount expects, and the whole surface exercised with real shell tools through /omnifs/k8s.

The bug only the live run could catch

cat pods/<pod>/logs/<container>.log failed with EINVAL. The apiserver's content negotiation rejects Accept: text/plain on the log subresource with 406 Not Acceptable — even though the endpoint streams text — and 4xx maps to invalid-inputEINVAL. Earlier validation probed the HTTP contract with curl (whose default is Accept: */*), so the provider's own request shape was never exercised. Fixed to send Accept: */*; re-verified live (app/init/sidecar logs, tail, wc -l).

Review findings fixed in 68918af

  • Listings no longer freeze. All live listings were exhaustive, which the host caches with no TTL — after one ls, new pods were invisible for the session. Now open: every readdir re-lists. Verified live: pod created after first ls appears on the next; a Terminating pod stays visible exactly as long as kubectl get pods shows it.
  • URL metachar injection closed. cat 'pods/x?watch=true/manifest.yaml' used to smuggle a query through the raw URL path (e.g. holding a watch open). Segments now reject % ? # and control chars — fail-closed ENOENT, verified live. Nothing legal is lost: % is forbidden by k8s' own ValidatePathSegmentName, and system:* RBAC names keep working (verified live).
  • Root discovery failures propagate. A transient error on /api/v1 or /apis was swallowed and the half-empty catalog cached for the session (which could also invert bare-plural collision naming, worst case handing a pods CRD the logs/ subtree). Roots now fail loudly and retry on next browse; per-group-version failures are still skipped; all group-version fetches now run in one batched callout round.
  • Dead lookup branches removed. The router resolves capture-dir lookups statically before handlers run, so the five DirIntent::Lookup existence checks never executed — and carried the only unvalidated URL interpolation. Deleted.
  • Siblings project from one fetch (repo rule: "project all data you have already fetched"): reading any of manifest.yaml/manifest.json/status.yaml preloads the other two from the same GET (≤64 KiB inline cap); events.txt preloads all three from the object it already fetches for the uid.
  • kubectl parity tightened: event field-selector values escape \ , = like fields.EscapeValue; recurring events.k8s.io events read series{count,lastObservedTime} like kubectl's printer (was COUNT 1 at first-occurrence time).
  • Docs corrected: the https+bearer-token claim was wrong (v1 injects no Authorization header — unix:// is the supported transport); the proxy socket must be reachable inside the runtime container; pods is always listed under hide_empty_types (it anchors the logs/ scaffolding); YAML 1.1 quoting divergence from kubectl (yes/no unquoted) documented.

Live verification summary (k3s v1.34.1, through FUSE)

Check Result
manifest.json vs kubectl get -o json (jq -S both) byte-identical
managedFields stripped, last-applied kept
CRD created live (widgets.example.io) appears automatically, CR readable
Pod logs: app / terminated init / sidecar, tail, wc ✓ (after 406 fix)
events.txt real scheduler events, kubectl-style table
Listing freshness (create/delete between ls) mirrors API exactly
grep -r, find, du, wc, stat, md5sum (stable), cp+diff, tar ✓ (tar warns "file changed" from the host's learned-size promotion — pre-existing host behavior for Size::Unknown files, all providers)
system:node ClusterRole (colon name), /cluster/nodes status
Nonexistent object / type / x?watch=true ENOENT, ENOENT, ENOENT
No-socket mount in omnifs dev container healthy, clean Network error on browse

Full check suite is green: cargo fmt, host tests (17 in this provider), wasm check/clippy -D warnings/test --no-run across all omnifs-provider-*/omnifs-tool-*, host workspace clippy + tests, and omnifs dev -y + smoke harness.

One note for maintainers: CI never ran on this PR (fork PR, workflow concluded action_required) — it needs a maintainer to approve the workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants