refactor(be,fe): data-only OpenID providers with on-demand SSO discovery/JWKS caches#4022
Conversation
…aches Refactor the OpenID module off trait-dispatched providers (Vec<Box<dyn OpenIdProvider>>) onto a data-only model with a single shared verification pipeline. Verification is identical for every provider kind up to the JWK read; the only divergence is the JWK source: - Configured providers (Google/Microsoft/Apple): JWKs in stable storage (memory id 24), seeded + timer-refreshed (unchanged, PR #3959). Always synchronously Ready. - SSO (discoverable) providers: discovery and JWKS fetched on demand into two single-flight caches (domain -> config, jwks_uri -> keys), replacing the DISCOVERY_TASKS background timer. May read Pending on a cold cache. Module reshape (openid/generic.rs removed): - verify.rs shared pipeline (decode -> claims -> signature -> build) - configured.rs data-only CONFIG_REGISTRY + stable JWKs + refresh timer + seed - sso.rs two single-flight caches + on-demand two-hop discovery + allowlist - jwks.rs the JwkSource seam + shared JWKS fetch - provider.rs dispatch: (iss,aud[,discovery_domain]) -> descriptor + JwkSource The four JWT methods take an optional discovery_domain and surface a Pending error on a cold SSO cache (configured providers never do). prepare/get delegation use the poll model: prepare (update) drives the fills, get (query) peeks via the new single_flight_cache::peek, re-calling prepare on Pending. Adds discover_sso / discover_sso_query for canister-side sign-in initiation. Drops the sso_fields_for reverse-scan (migration #4013 complete); credential SSO fields are read straight off the stamp. Candid + tests updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x comments - Replace the update/query mode flag with one peek-based read path (verify_jwt, discover_sso, resolve, read_jwks) plus a separate prefetch_sso that updates call to drive the cache fills. Queries read; updates prefetch then read. - Remove the now-unused discoverable-OIDC registry and listing: discovered_oidc_configs, add_discoverable_oidc_config, OidcConfig, and the in-memory domain list. On-demand discover_sso gated by the allowlist covers the use case. - Comment pass: describe the code as it is, no design-doc/PR citations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… caches The SSO discovery domain now flows through the JWT methods, and the frontend reads the canister's on-demand discovery instead of fetching it itself. - authenticateWithJWT takes a discoveryDomain and polls prepare/get delegation while the canister reports Pending (cold SSO cache), re-calling prepare to drive the fetch. Configured providers resolve on the first attempt. - ssoDiscovery.ts resolves a domain through discover_sso / discover_sso_query (validate, drive, poll) instead of the client-side two-hop fetch; the auth and add-access-method flows thread the domain into prepare/credential_add/ registration_finish. - Shared openidPoll helpers (pollDelay, retryWhilePending) back the three poll sites. discover_sso replaces the add_discoverable_oidc_config call sites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ration create_identity verified the JWT inside storage_borrow_mut; now that verification reads JWKs from stable storage / the SSO caches, that nested storage_borrow trapped with 'RefCell already mutably borrowed'. Hoist the verification out of the mutable borrow and pass the result in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 1-click `?sso=<domain>` authorize path redeemed its JWT through continueWithOpenId without the discovery domain, so the canister verified it against the (empty) configured registry and failed. Thread discoveryDomain through continueWithOpenId -> openIdJwtSignIn and through completeOpenIdReg -> registerWithOpenId -> openIdRegistrationCommit so SSO sign-in and sign-up both verify against the domain's discovery. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
discover_sso/discover_sso_query both returned the value, making the query look
redundant. Match the DoH resolve/status poll instead:
- discover_sso (update) only drives the discovery cache (one cache — JWKS is a
verify-time concern) and returns Ok/Err, no value.
- get_sso_discovery (query) reads the resolved config.
- The frontend polls the query and, while it reads no value, drives the update
— same shape as the email-recovery DoH poll loop.
Also replace the free-text Err with a typed SsoDiscoveryError { DomainNotAllowed }
(a failed fetch reads as not-resolved-yet, so that's the only error), and drop
the unused detail field / parameter-property syntax on DomainNotConfiguredError.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/Result
get_sso_discovery returned Result<opt SsoDiscovery, SsoDiscoveryError> — two
layers of 'maybe' that don't say what's going on. Replace it with a status
variant, like the email-recovery status query:
get_sso_discovery : (text) -> (SsoDiscoveryState) query;
SsoDiscoveryState = variant { Resolved : SsoDiscovery; Pending; NotAllowed };
discover_sso (update) becomes a fire-and-forget drive (-> ()); the allowlist
rejection is reported by the query's NotAllowed state, so SsoDiscoveryError is
gone. The frontend polls the query and drives the update while Pending.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The frontend build's message extraction rewrote the bot-managed locale files; restore them to match main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A cold SSO discovery/JWKS cache is a retry signal, not a failure, but it
was modelled as a `Pending` member of OpenIdCredentialAddError and
OpenIdDelegationError. That put a transient state on the error channel: the
shared error toaster (error.ts) had to special-case it, and any consumer that
folded the error into a terminal failure would wrongly give up.
Move it onto the result:
openid_credential_add -> variant { Ok; Pending; Err : OpenIdCredentialAddError }
openid_prepare_delegation -> variant { Ok : ...; Pending; Err : OpenIdDelegationError }
openid_get_delegation -> variant { Ok : ...; Pending; Err : OpenIdDelegationError }
backed by a small generic OpenIdResult<T, E>. Pending drops out of both error
enums and out of the error.ts catch-all. The frontend poll loops (jwt.ts,
retryWhilePending) match the top-level Pending arm and retry — including the
account-link path, which now keeps polling while the cache warms instead of
erroring out. openid_identity_registration_finish is unchanged: the sign-in
attempt warms the JWKS before registration, so it never surfaces Pending.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
openid_identity_registration_finish folded a cold SSO discovery/JWKS cache
(verify_jwt -> Cached::Pending) into a terminal IdRegFinishError, on the
assumption that the preceding sign-in had already warmed the keys. But the
JWKS cache is LRU + TTL with backoff, so the entry can be evicted between
sign-in and registration — turning an SSO signup into a hard failure with a
misleading "OIDC discovery in progress" error and no retry.
Surface it as a retry instead, like the other OpenID methods:
openid_identity_registration_finish -> variant { Ok : IdRegFinishResult; Pending; Err : IdRegFinishError }
Verification is hoisted into the endpoint (verify_openid_for_registration):
on Cached::Pending it returns the Pending arm; on Cached::Ready the verified
credential is handed to the shared registration flow, so it's verified exactly
once and the pubkey path is untouched. The frontend signup commit polls with
retryWhilePending, consistent with the link and sign-in paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ No security or compliance issues detected. Reviewed everything up to ae6c8b3. Security Overview
Detected Code ChangesThe diff is too large to display a summary of code changes. |
There was a problem hiding this comment.
Pull request overview
Refactors Internet Identity’s OpenID/SSO implementation from provider trait dispatch into a single shared, data-driven JWT verification pipeline. It moves SSO two-hop discovery and JWKS caching fully canister-side, and introduces a first-class Pending retry result for cold/evicted SSO discovery/JWKS caches, with corresponding frontend polling/retry logic.
Changes:
- Backend: Introduces shared OpenID verification pipeline (
openid/verify.rs) with provider resolution (provider.rs) and a JWK source seam (jwks.rs) split between configured providers (stable+timer refreshed) and SSO providers (on-demand single-flight caches). - Backend API/DID: Adds
discover_sso/get_sso_discovery, and updates OpenID methods to accept an optional SSO discovery domain and returnOpenIdResult { Ok | Pending | Err }. - Frontend: Removes client-side two-hop discovery fetching; instead polls
get_sso_discoveryand drivesdiscover_sso, and retries OpenID calls while the canister returnsPending.
Reviewed changes
Copilot reviewed 36 out of 38 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/internet_identity/tests/integration/v2_api/authn_method_test_helpers.rs | Updates OpenID registration helper to include discovery_domain. |
| src/internet_identity/tests/integration/config/sso_discovery.rs | New integration tests for SSO discovery allowlist gating. |
| src/internet_identity/tests/integration/config/oidc_configs.rs | Removes obsolete OIDC config registration/discovery tests. |
| src/internet_identity/tests/integration/config.rs | Switches integration test module from oidc_configs to sso_discovery. |
| src/internet_identity/src/storage/anchor.rs | Drops legacy fallback for unstamped SSO fields; reads stamped values directly. |
| src/internet_identity/src/single_flight_cache.rs | Adds query-safe peek API that never spawns fills/mutates cache. |
| src/internet_identity/src/openid/verify.rs | New shared JWT verification + credential build pipeline. |
| src/internet_identity/src/openid/sso.rs | New canister-side SSO discovery + JWKS single-flight caches and allowlist gate. |
| src/internet_identity/src/openid/provider.rs | Resolves JWTs to provider descriptors + JWK sources (configured vs SSO). |
| src/internet_identity/src/openid/jwks.rs | Introduces JWK source abstraction + shared JWKS fetch/transform. |
| src/internet_identity/src/openid/configured.rs | New configured-provider registry + stable JWKS seed/refresh logic. |
| src/internet_identity/src/openid.rs | Replaces provider trait dispatch with configured/SSO resolution and shared verify pipeline; adds SSO discovery APIs. |
| src/internet_identity/src/main.rs | Updates canister methods for new OpenID result shapes and adds SSO discovery endpoints. |
| src/internet_identity/src/attributes.rs | Updates SSO attribute tests to rely on stamped sso_domain. |
| src/internet_identity/src/anchor_management/registration/registration_flow_v2.rs | Hoists OpenID verification for registration and threads verified credential through finish path; supports Pending. |
| src/internet_identity/internet_identity.did | Updates DID types/methods for OpenIdResult, SSO discovery APIs, and new args. |
| src/internet_identity_interface/src/internet_identity/types/openid.rs | Adds OpenIdResult type used by updated OpenID APIs. |
| src/internet_identity_interface/src/internet_identity/types/api_v2.rs | Extends OpenIDRegFinishArg with discovery_domain. |
| src/internet_identity_interface/src/internet_identity/types.rs | Adds SsoDiscovery / SsoDiscoveryState and updates related docs. |
| src/frontend/tests/e2e-playwright/fixtures/sso.ts | Adjusts e2e fixture comments to reflect canister-side SSO discovery. |
| src/frontend/src/routes/(new-styling)/authorize/+page.ts | Updates allowlist-boundary comments for 1-click SSO entry. |
| src/frontend/src/routes/(new-styling)/authorize/+page.svelte | Removes client call to register discoverable OIDC config; passes SSO domain through OpenID flow. |
| src/frontend/src/lib/utils/ssoDiscovery.ts | Replaces FE two-hop fetch with canister-driven discover_sso/get_sso_discovery polling. |
| src/frontend/src/lib/utils/ssoDiscovery.test.ts | Updates unit tests to mock canister SSO discovery API and polling behavior. |
| src/frontend/src/lib/utils/openidPoll.ts | New shared polling helpers (pollDelay, retryWhilePending). |
| src/frontend/src/lib/utils/authentication/jwt.ts | Retries openid_prepare_delegation / openid_get_delegation while canister returns Pending. |
| src/frontend/src/lib/stores/last-used-identities.store.ts | Updates docs around SSO resolution source (canister discovery). |
| src/frontend/src/lib/generated/internet_identity_types.d.ts | Regenerates bindings for updated DID (new SSO types, new OpenID arg/result shapes). |
| src/frontend/src/lib/generated/internet_identity_idl.js | Regenerates IDL factory for updated DID (new methods and types). |
| src/frontend/src/lib/flows/authLastUsedFlow.svelte.ts | Re-resolves SSO configs via canister when continuing with last-used SSO identity. |
| src/frontend/src/lib/flows/authFlow.svelte.ts | Threads SSO discovery domain through OpenID flows and retries registration while Pending. |
| src/frontend/src/lib/flows/addAccessMethodFlow.svelte.ts | Links OpenID/SSO credentials while retrying openid_credential_add on Pending. |
| src/frontend/src/lib/components/wizards/auth/views/SignInWithSso.svelte | Removes canister-side SSO registration call; relies on discovery polling and updated error mapping. |
| src/frontend/src/lib/components/wizards/auth/views/PickAuthenticationMethod.svelte | Updates comments to reflect new SSO allowlist enforcement via discover_sso. |
| src/frontend/src/lib/components/wizards/addAccessMethod/views/AddAccessMethod.svelte | Updates comments to reflect new canister-side SSO resolution/rejection. |
| src/canister_tests/src/api/internet_identity/api_v2.rs | Updates v2 OpenID registration finish helper to collapse OpenIdResult via settled. |
| src/canister_tests/src/api/internet_identity.rs | Replaces OIDC discovery helpers with SSO discovery helpers and adds settled for OpenID result collapsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolve conflicts in the generated II bindings (internet_identity_idl.js, internet_identity_types.d.ts) by regenerating from the merged candid with the pinned didc; both sides only added independent types/methods (SSO discovery here, session delegation on main).
The untrusted discovery_domain argument reached the sso_domain stamp verbatim while the allowlist gate matches case-insensitively, so a mixed-case domain (e.g. Example.ORG) passed the gate but was stored under a different sso:<domain> scope than the canonical allowlisted value, breaking credential lookup. Trim + lowercase it at each endpoint boundary via a shared canonical_discovery_domain helper, matching the sso_discoverable_domains config setter and the get_sso_discovery reply which already canonicalize.
|
Merged Went through the 6 Copilot comments; only one was a real issue:
Validation: clippy |
The SSO sign-in/delegation path had no canister-endpoint integration coverage — only sso.rs/verify.rs unit tests and Playwright e2e. Existing OpenID integration tests exercise configured providers with no discovery domain; config::sso_discovery only covers the allowlist gate. Add integration tests that drive the on-demand SSO path with a discovery domain, mocking the two discovery hops + the JWKS fetch so the single-flight caches warm from Pending to Ready (the frontend's retry-while-Pending loop): - can_link_sso_account_via_discovery: credential add stamps the discovered sso_domain / sso_name. - can_get_sso_delegation_via_discovery: prepare (update, warms caches) + get (query, reads warm caches) issue a delegation. - sso_discovery_domain_is_canonicalized: a mixed-case/padded domain is stored as canonical lowercase — regression test for the boundary canonicalization. Adds *_with_discovery helper variants returning the raw OpenIdResult so the Pending arm is observable; the existing helpers delegate with None.
On a system subnet HTTP outcalls cost no cycles and ingress flooding is a boundary-node concern, so the SSO caches' only DDoS-relevant job is bounded, fairly-evicted state — which matters once the discovery allowlist is removed and the discovery domain / jwks_uri become caller-controlled and unbounded. Each cache's worst case is max_entries * per-entry cap, so: - Split the shared cache_config so the two caches size independently: discovery max_entries 10_000 (small DiscoveredConfig, ~10-20 MB), JWKS max_entries 2_000 (~64 MB at the 32 KiB cap below). - Cap HTTP outcall response sizes (were unbounded): discovery hops 16 KiB; JWKS fetch 32 KiB — broad enough for real key sets with x5c chains (Microsoft is ~14.5 KB) so the shared configured-provider refresh is unaffected. - Cap kept JWKS keys at 20 (verification matches by kid; providers publish far fewer) and stored discovery scopes at 32 (the only unbounded DiscoveredConfig field). No behavior change for honest providers; bounds the attacker-reachable worst case to ~tens of MB. Allowlist removal itself is deferred.
The discovery and JWKS caches back one coupled flow (domain -> discovery -> jwks_uri -> JWKS), and a verification needs both the domain's discovery entry and its JWKS entry warm, so separate budgets just leave one cache holding entries the other can't back. Replace the two caps with a single SSO_CACHE_MAX_ENTRIES (5000), bounded by the larger JWKS entry (32 KiB) -> ~170 MB worst case (~5-6% of the ~3 GB heap), keeping wide eviction headroom.
|
Bounded the two SSO single-flight caches ( Why this shape. II is on a system subnet — HTTP outcalls cost no cycles — and ingress flooding is handled by the boundary nodes. So these caches don't need outcall-rate-limiting or cycle accounting; their only DDoS-relevant job is bounded, fairly-evicted state. Single shared budget. Discovery and JWKS are one coupled flow ( Per-fill / per-entry caps (fills were previously unbounded,
No behavior change for honest providers. |
discover_sso / get_sso_discovery passed the raw domain straight through; the lookup path lowercases but never trims, so " dfinity.org" was rejected where the JWT endpoints (which canonicalize at the boundary) accept it. Route both through canonical_discovery_domain so every endpoint gates on the same trimmed+lowercased form. Splits the canonicalizer into a String->String core and an Option wrapper (canonical_discovery_domain_opt) for the JWT endpoints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
discoverSsoConfig only checked signal.aborted at the top of each iteration, so an abort during the 500ms sleep or the get_sso_discovery query still fired one more discover_sso update before the loop noticed. Make pollDelay resolve early on abort and re-check the signal before driving the update, so a dropped lookup stops without an extra call. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The loop-top `signal?.aborted === true` guard narrows `aborted` to `false | undefined` for the rest of the iteration, and tsc doesn't re-widen across the intervening `await`s, so the second inline check before the discover_sso update was flagged as an always-false comparison (TS2367), breaking `npm run check` and the frontend wasm build. Extract an `isAborted(signal)` helper so each check returns a fresh boolean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring in the 3 base commits (incl. the TS2367 SSO discovery abort fix) that landed on #4022 after this branch was cut.
Backend (openid): - Fix build: import the `std::fmt::Display` trait (not the `std::ffi::os_str::Display` struct) and repair a botched `resolve()` call in the SSO unit tests left over from adding the `jwt_aud` arg. - verify_claims: make the JWT time checks panic-free. Compare validity in whole seconds and use `checked_add` for `iat + window` so a malicious far-future `iat`/`exp` can't overflow `secs_to_nanos`. - verify_and_build: destructure the verified `Claims`, explicitly dropping `aud` (only weakly checked; the canonical `client_id` is stored instead) and the consumed `nonce`/`exp`/`iat`. - Add negative-path unit tests: invalid signature, issuer, audience and nonce; expired JWT; JWT past the validity window; JWT issued in the future; and a far-future `iat` overflow guard. Reset the test clock and caller at the start of every test so thread reuse can't leak state. Frontend (ssoDiscovery): - Parse the loopback host via `new URL` instead of hand-splitting on `:`, and reject hosts carrying a path/query/fragment. Add tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remaining files for the review fixes: - openid/verify.rs: panic-free time checks, destructured Claims, and the negative-path unit tests. - openid/sso.rs: repair the botched `resolve()` call in the SSO tests. - frontend ssoDiscovery.ts/.test.ts: URL-based loopback parsing + tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the open review comments and got the branch compiling again. Build fixes — the branch did not compile at
Review comments:
Locally verified: Generated by Claude Code |
OpenID sign-in dispatched through trait objects (
dyn OpenIdProvider) and resolved SSO organization domains client-side. This reworks it into a single data-driven verification pipeline with two clearly separated JWK sources of truth, moves SSO discovery onto the canister, and makes the "cache still warming" state a first-class retry signal instead of an error. No user-visible behavior changes for the configured Google / Microsoft / Apple providers; SSO (organization) sign-in now resolves and verifies entirely canister-side.Changes
Backend
dyn OpenIdProvidertrait dispatch and the monolithicopenid/generic.rswith a data-only provider model: one shared verification pipeline (openid/verify.rs,provider.rs,jwks.rs) that is identical for every provider up to the point it reads the JWK.openid/configured.rs) keep their timers + stable-storage-persisted keys (some providers can't reach HTTP-outcall consensus); SSO providers (openid/sso.rs) source keys on demand through the single-flight cache.discover_sso : (text) -> ()(fire-and-forget drive of the two-hop discovery+JWKS fetch) andget_sso_discovery : (text) -> (SsoDiscoveryState) querywhereSsoDiscoveryState = variant { Resolved : SsoDiscovery; Pending; NotAllowed }. Gated by thesso_discoverable_domainsallowlist. Replaces the client-side discovery module.Pendingas a result arm (not an error) when the SSO discovery/JWKS cache is cold or has been evicted:openid_credential_add,openid_prepare_delegation,openid_get_delegationreturnOpenIdResult<T, E> = variant { Ok; Pending; Err }.openid_identity_registration_finishreturnsvariant { Ok; Pending; Err }; its verification is hoisted into the endpoint so the JWT is verified exactly once and the pubkey-registration path is untouched.Pendingis removed fromOpenIdCredentialAddError/OpenIdDelegationError.Frontend
get_sso_discoveryand drivesdiscover_ssowhilePending;ssoDiscovery.tsno longer fetches discovery documents itself.Pending(sharedretryWhilePending/ poll loop), so a cold or evicted SSO cache no longer turns into a hard error.Tests
config/sso_discovery.rsintegration tests for thediscover_sso/get_sso_discoveryallowlist gate.🤖 Generated with Claude Code