Skip to content

feat(nix): QUIC-enabled Node 26 toolchain (node:quic runtime)#1372

Open
srid wants to merge 5 commits into
masterfrom
quic-node-runtime
Open

feat(nix): QUIC-enabled Node 26 toolchain (node:quic runtime)#1372
srid wants to merge 5 commits into
masterfrom
quic-node-runtime

Conversation

@srid

@srid srid commented Jun 14, 2026

Copy link
Copy Markdown
Member

What

Migrate the whole repo from Node 24 to a Node 26 built with --experimental-quic, so require("node:quic") — Node's built-in QUIC — is available everywhere pkgs.nodejs flows (devShell, the kaval/agent closures, pnpm-typecheck, the website). One overlay swap; no app code changes.

This is the day-one runtime prerequisite for kaval's roaming remote transport: QUIC connection migration lets an attached kaval-tui --host session survive a network change (Wi-Fi↔cellular, sleep/wake) with no reconnect — the P1 foundation from the kaval-vs-zmosh Atlas note. No transport code lands here — only the Node that can speak it.

The change (nix-only)

  • npins/sources.json — bump nixpkgs-unstable to a rev carrying nodejs_26 (26.3.0) + openssl 3.6.2.
  • nix/overlay.nix — rebuild nodejs-slim_26 (the real compile; the full nodejs is a thin join over its out+npm, so the flag must land on the slim and propagates up via the fixpoint) with --experimental-quic, and un-share openssl + ngtcp2 + nghttp3 so Node compiles its bundled QUIC stack (see "Why not nixpkgs" below for why that's required). nodejs is aliased to the QUIC-enabled full Node 26. Also pin pnpm to v10 — the nixpkgs bump moved the default pnpm to 11, which stopped reading package.json's pnpm field and breaks our frozen install (ERR_PNPM_LOCKFILE_CONFIG_MISMATCH); a pnpm-11 config migration is its own change.
  • flake.nix — a node-quic check that asserts require("node:quic") loads under --experimental-quic and process.features.quic === true, realized on every platform by the existing nix/flake-check CI nodes.

✅ Verification (built off-machine on real Linux hosts)

node-quic-smoke> node:quic OK v26.3.0     # require('node:quic') + process.features.quic === true
  • .#checks.x86_64-linux.node-quic — green.
  • .#default + .#checks.x86_64-linux.typecheck — green: node-pty recompiles clean against Node 26 (it's Node-API, ABI-stable across majors), pnpm install clean, typecheck passes.
  • CI below validates the same on aarch64-darwin (the second required lane).

User impact (the cost, honestly)

The custom Node isn't in any binary cache (it can't be — see below), so a cold nix run github:juspay/kolu#kaval / nix develop compiles Node from source once per machine, then it's served from /nix/store forever after.

  • Measured: 13 min 31s on a 32-core box (full: fetch + bundled openssl + V8 + ngtcp2/nghttp3 + link + Node's test suite).
  • Budget ~40–60 min on a typical 8-core laptop, more on fewer cores — but one-time.

Mitigation — our own cache (recommended follow-up). The repo already lists cache.nixos.asia/oss as a substituter (read), but nothing in-repo pushes to it — it's populated out-of-band by juspay infra, so today a cold machine compiles. To spare end users (and cold/re-created CI boxes, and the darwin lane) the compile, we should push master's closure — including this Node — to cache.nixos.asia/oss (an Attic/nix copy step with the OSS signing key). Then nix run substitutes instead of compiling. I left this out of this PR because it's separate infra from the runtime; happy to do it next.

Why isn't this just in nixpkgs?

Investigated exhaustively (nixpkgs issues, PRs, code, and Discourse): there is no issue and no PR — open, closed, or merged — to enable --experimental-quic/node:quic in nixpkgs' nodejs. Nobody has pushed it upstream. It's absent by design, for three reasons:

  1. Upstream gates it. node:quic needs a compile-time and runtime --experimental-quic flag; it's experimental and was disabled during the OpenSSL-3.5 transition. nixpkgs builds the stable default config.
  2. nixpkgs builds Node shared-everything, but Node's node.gypi only wires the vendored ngtcp2/nghttp3 — the copies whose internal headers (nghttp3/lib/nghttp3_conn.h, used by src/quic/*.cc) Node needs — under node_use_quic && node_shared_openssl=="false" (a bundled openssl). So with --shared-openssl, HAVE_QUIC=1 compiles the QUIC sources but never adds the ngtcp2/nghttp3 dependency → the build fails on the missing header. nixpkgs' default is structurally incompatible; that's exactly why this overlay un-shares openssl/ngtcp2/nghttp3 and lets Node vendor them.
  3. nixpkgs' nghttp3 is curl-oriented — it ships public headers only (no internal lib/*.h), and at a newer version (1.15 vs Node's vendored 1.11), so even forcing the shared path can't work.

Tellingly, the sole nixpkgs nodejs maintainer (aduh95) is himself a Node.js core / QUIC collaborator and still hasn't enabled it — confirming this is a deliberate consequence of the shared-libs dedup design, not an oversight. (node:quic graduating upstream later means we just drop the --experimental-quic flag — no other change.)

Follow-ups (separate PRs)

  • Push master's closure (incl. this Node) to cache.nixos.asia/oss so cold machines substitute instead of compiling.
  • Migrate the workspace's pnpm config to pnpm 11's layout (move pnpm.* from package.json to pnpm-workspace.yaml), then unpin pnpm.
  • P1 proper: links/quic.ts + the HostSession QUIC dial seam + kaval-tui --host over QUIC (the roaming demo), on top of this runtime.

🤖 Generated with Claude Code

Migrate the whole repo from Node 24 to a Node 26 built with
--experimental-quic, so `require("node:quic")` is available everywhere
`pkgs.nodejs` flows. This is the day-one runtime for kaval's roaming
remote transport — QUIC connection migration carries an attached session
across a network change with no reconnect (docs/atlas kaval-vs-zmosh).

- npins: bump nixpkgs-unstable to a rev carrying nodejs_26 (26.3.0).
- overlay: rebuild nodejs-slim_26 (the real compile — the full `nodejs`
  is a thin join over it) with --experimental-quic, and un-share openssl
  + ngtcp2 + nghttp3 so Node builds its bundled QUIC stack. Node only
  wires the vendored ngtcp2/nghttp3 — the copies whose internal headers
  src/quic/*.cc needs — under node_use_quic && node_shared_openssl=="false"
  (node.gypi), and nixpkgs' shared nghttp3 ships public headers only, so
  the QUIC stack must be vendored on a bundled openssl 3.5. nghttp2
  (HTTP/2) and the other shared libs are untouched. `nodejs` is aliased
  to the QUIC-enabled full Node 26 — one swap migrates devShell, the
  kaval/agent closures, pnpm-typecheck, and the website.
- flake: a `node-quic` check asserting require("node:quic") loads under
  --experimental-quic, realized on every platform by the nix/flake-check
  CI nodes.

Refs the kaval-vs-zmosh atlas note (P1 runtime prerequisite).
@srid

srid commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

Evidence

This is a build-toolchain/runtime change (no on-screen surface), so the behavior worth proving is the runtime capability itself: that pkgs.nodejs now exposes built-in QUIC. Verified end-to-end on real hosts (built off the dev machine on pu boxes), and re-validated by CI on both platforms below.

node:quic loads on the QUIC Node 26.#checks.<system>.node-quic:

node-quic-smoke> node:quic OK v26.3.0
EXIT=0

The check runs node --experimental-quic -e 'require("node:quic"); if (process.features.quic !== true) throw …' against pkgs.nodejs, so a green build proves both the compile-time and runtime QUIC gates on each platform.

Whole-repo migration is clean.#default + .#checks.x86_64-linux.typecheck built green:

  • node-pty recompiled against Node 26 (NODE_MODULE_VERSION 142) with no error — it's Node-API, ABI-stable across majors:
    kolu> make: Entering directory '…/node-pty/build'
    kolu> make: Leaving directory  '…/node-pty/build'
    
  • pnpm install clean (Lockfile is up to date), typecheck passes.

Compile cost (the one-time from-source build, since it's not cacheable upstream): 13 min 31s on a 32-core box — full fetch + bundled openssl + V8 + ngtcp2/nghttp3 + link + Node's own test suite + the smoke. See the PR's "User impact" section for the laptop estimate and the cache-push follow-up.

srid added 3 commits June 14, 2026 19:35
The nixpkgs bump for Node 26 also removed the `nodePackages` set, which
throws on eval ("nodePackages has been removed"). shell.nix referenced
`nodePackages.node-gyp`; it's now `pkgs.node-gyp` (12.3.0) at the top
level. Only the devShell eval (ci::flake-check / install / atlas-sync)
hit this — the `.#default`/`.#checks` builds don't evaluate shell.nix.
…ht-driver

The nixpkgs bump (for Node 26) moved `playwright-driver` 1.57 -> 1.60,
which ships a newer chromium revision. The npm `playwright` pin stayed
1.57, so the e2e browser launch looked for chromium_headless_shell-1200
at PLAYWRIGHT_BROWSERS_PATH (the nix driver) and failed. Bumping the npm
side to 1.60 realigns it. playwright lives in packages/tests, which is
excluded from the pnpmDeps fileset (default.nix), so the FOD hash is
unchanged; the e2e's own install picks up 1.60.
… UDP)

With --experimental-quic compiled in, Node's check phase runs
parallel/test-quic-* — experimental tests that need UDP/loopback the
macOS Nix sandbox forbids, so they fail on aarch64-darwin (e.g.
test-quic-h3-zero-rtt) even though require('node:quic') works (the
node-quic flake check proves it). They pass in the Linux sandbox, so
doCheck stays on for Linux (full packaging validation, and the Linux
node's hash is unchanged) and is disabled only on Darwin.
@srid

srid commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

🧪 CI metrics — leased pool box

The x86_64-linux lane ran on kolu-ci-6 (dev-x86-64-linux-03) — commit 6156718f, passed

  • Lane wall (pipeline): 19m31s
recipe duration
ci::atlas-sync 17m43s
ci::install 17m26s
ci::smoke 17m23s
ci::nix 17m18s
ci::home-manager 14m28s
ci::flake-check 13m52s
ci::e2e 1m34s
ci::pnpm-hash-fresh 45s
_ci-setup 30s
ci::unit 21s
ci::surface-example-build 10s
ci::surface-app-example-build 10s
ci::biome 9s
ci::fmt 8s

Pool status (8 boxes)

box location state
kolu-ci-1 dev-x86-64-linux-04 ✓ idle
kolu-ci-2 dev-x86-64-linux-04 ✓ idle
kolu-ci-3 dev-x86-64-linux-03 ✓ idle
kolu-ci-4 dev-x86-64-linux-08 ✓ idle
kolu-ci-5 idliv2-02 ✓ idle
kolu-ci-6 dev-x86-64-linux-03 🔒 leased
kolu-ci-7 dev-x86-64-linux-05 ✓ idle
kolu-ci-8 dev-x86-64-linux-08 ✓ idle

Posted by ci/pu/report.sh. Lane timings from .ci/6156718/timings.jsonl; pool state is a live flock probe.

@srid srid marked this pull request as ready for review June 15, 2026 02:40
@srid

srid commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

✅ Green on both platforms

CI passes on x86_64-linux AND aarch64-darwin (28/28 contexts + SentinelOne) at 6156718fnode:quic OK v26.3.0 on both, e2e green on both (linux 1m34s, darwin 5m35s), node-pty recompiled, typecheck/builds/smoke all pass.

Darwin needed two extra fixes beyond linux (now in the PR):

  • doCheck off on Darwin only: with QUIC compiled in, Node's check phase runs parallel/test-quic-*, which need UDP/loopback the macOS Nix sandbox blocks (they pass on linux). node:quic itself loads fine — the flake check proves it. doCheck stays on for linux (full validation, hash unchanged), off for darwin.
  • The QUIC Node was built directly on rasam to warm its store; the darwin lane then substituted it (3m26s) instead of a cold from-source compile.

Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant