refactor(be): typestate pipeline for SMTP request validation#3946
refactor(be): typestate pipeline for SMTP request validation#3946aterga wants to merge 11 commits into
Conversation
Replace the ad-hoc `&SmtpRequest`-threading through DKIM and DMARC verifiers with a three-stage typestate ladder that encodes the trust level of an inbound email in its type. The canister boundary now flows raw wire requests through UnverifiedSmtpRequest (bounds + RFC 5322 §3.6 well-formedness) → Vec<SignedSmtpRequestProjection> (one per parsed DKIM-Signature) → VerifiedSmtpRequest (DKIM-certified, DMARC-aligned), and downstream code can only ever read a fully- verified value via header() / body(). The legacy verify and verify_email entry points are gone; their per-signature DKIM math and DMARC-alignment helper live in the typestate's verify_one_signature and compute_dmarc_outcome — single source of truth, no duplication. Behaviour change: bottom-up signature iteration (RFC 6376 §5.4), replacing today's top-down first-only assumption. Stage 1 rejects duplicate From/Date/Subject and the rest of the §3.6 single- occurrence header set before any cryptographic work runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
✅ No security or compliance issues detected. Reviewed everything up to 3c9ebb4. Security Overview
Detected Code ChangesThe diff is too large to display a summary of code changes. |
There was a problem hiding this comment.
Pull request overview
This PR refactors the email-recovery SMTP verification flow to use a three-stage typestate pipeline so trust/verification progress is enforced by Rust types (unverified → scoped DKIM-signature projections → DKIM+DMARC verified request), and updates DKIM/DMARC modules and test vectors accordingly. It also changes DKIM signature selection behavior to iterate bottom-up (per RFC 6376 §5.4), preferring originating-domain signatures over forwarder-prepended ones.
Changes:
- Introduces
email_recovery::typestateimplementing stage-1 RFC/bounds validation, stage-2 DKIM signature scoping, and stage-3 DKIM verification + DMARC alignment with aggregated diagnostics. - Updates
email_recovery::smtpto drive the typestate pipeline on both DoH and DNSSEC paths, and removes the legacy DKIM/DMARC top-level verification entry points/enums. - Migrates DKIM/DMARC end-to-end tests to assert against typestate outputs (
VerifiedSmtpRequest/VerificationError).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/internet_identity/src/email_recovery/typestate.rs | New typestate ladder (stage 1/2/3) for SMTP request validation + DKIM/DMARC verification, with extensive unit tests. |
| src/internet_identity/src/email_recovery/smtp.rs | Switches SMTP handlers (validate + update) to typestate; updates DNSSEC partial verification and DoH full verification to use projections/verified requests. |
| src/internet_identity/src/email_recovery/mod.rs | Exposes the new typestate module within email_recovery. |
| src/internet_identity/src/dmarc/verify.rs | Deletes the legacy combined DKIM+DMARC verifier entry point (orchestration moved to typestate). |
| src/internet_identity/src/dmarc/types.rs | Removes legacy EmailVerificationStatus enum (typestate now represents the verified/unverified states). |
| src/internet_identity/src/dmarc/test_vectors.rs | Migrates DMARC end-to-end tests to drive typestate and assert on typestate outputs. |
| src/internet_identity/src/dmarc/mod.rs | Updates module docs/exports to reflect that DMARC now provides primitives; orchestration is in typestate. |
| src/internet_identity/src/dkim/verify.rs | Removes legacy DKIM verifier orchestration; retains canonicalization primitives used by typestate and updates tests accordingly. |
| src/internet_identity/src/dkim/types.rs | Removes legacy DkimVerifyResult enum (typestate now owns orchestration and verdict shaping). |
| src/internet_identity/src/dkim/test_vectors.rs | Migrates DKIM end-to-end tests to typestate and shares .eml parsing helper with DMARC tests. |
| src/internet_identity/src/dkim/mod.rs | Updates module docs/exports to reflect primitives-only DKIM module; orchestration moved to typestate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CI fix: reorder handle_smtp_request to run recipient dispatch BEFORE
stage 1's RFC §3.6 checks, so a misdirected email with a malformed
body returns 550 ("No such user here") instead of 555 (syntax error).
The unit-test fixture for silently-dropped emails (no nonce in
Subject / unknown nonce) gains the now-required `Date` and
`DKIM-Signature` headers via a small factory helper.
Review feedback:
- RfcError → SmtpResponse no longer leaks the HeaderCount Debug
shape; render multiplicity as "exactly once" / "at most once" /
"at least once" via a Display impl.
- DkimScopeError on absent-message says "missing message body"
instead of "caller bug" — the message can surface back to the
gateway as SMTP 555.
- build_verified switches to a Vec<bool> consumed-mask (drops the
O(|h|·|headers|²) Vec::contains scan) and emits signed_headers in
message source order, so VerifiedSmtpRequest::header()'s reverse
scan is genuinely bottom-up over the actual message bytes (not
over h= order, which could mis-attribute multi-occurrence headers).
- DoH and DNSSEC paths no longer clone the UnverifiedSmtpRequest to
feed stage 2: prepare_partial_verification and
verify_setup_email_doh take ownership, with handle_smtp_request
scoping its envelope/nonce borrows so request ownership flows
through cleanly.
- stage3_iterates_bottom_up now distinguishes its two test
signatures by failure reason (BodyHashMismatch vs
UnsupportedCanonicalization), so the iteration order is
observable from VerificationError.per_signature; the previous
enumerate-derived index assertions held regardless of direction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback on PR #3946, the RFC 5322 §3.6 header-uniqueness checks belong alongside `validate_smtp_request` in the wire-types crate, not as a parallel structural-validation step inside the canister's typestate module. `validate_smtp_request` now runs bounds + RFC §3.6 uniqueness in a single pass. Stage 1's `try_from` collapses to a thin marker over that contract and returns `SmtpResponse` directly; `RfcError` and `HeaderCount` no longer exist as canister-side types. The stage-1 unit tests follow the validation logic into the interface crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Eliminate the three .expect("stage 2 always carries a message")
panic sites: SignedSmtpRequestProjection now stores
Rc<SmtpMessage> instead of Rc<SmtpRequest>. Stage 1 → stage 2 takes
ownership of the message via destructure-let-else, so the invariant
"stage 2 always carries a message" is encoded in the type and no
longer enforced by runtime panic. `proj.message()` becomes a
trivial &self.message getter.
- Drop the winning_ prefix on VerifiedSmtpRequest fields:
winning_dkim_domain → dkim_domain, winning_checks → checks. Once a
caller holds a Verified value the multi-signature concept is over;
the prefix added nothing beyond noise.
- Rename verify_one_signature → verify_signature. The single-
projection argument makes the _one_ infix redundant. The DKIM
cryptographic primitive (also named verify_signature) is now
referenced by fully-qualified crate::dkim::verify_signature path
at the single call site, avoiding the name clash.
- Round out the §3.6 consolidation surface: the wire-types
validate_smtp_request now carries unit tests for the multiplicity
pass (duplicate From/Date/Subject/Message-ID, missing From, missing
DKIM-Signature, at-most-once tolerates 0-or-1, case-insensitive
name match). dkim/verify.rs and dkim/test_vectors.rs no longer
reference the removed RfcError/HeaderCount types; their negative
tests assert on SmtpResponse::Err(555) content instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sea-snake review feedback: the typestate module should orchestrate across `dkim` and `dmarc` rather than reimplement them. Moves the two helpers that were doing per-protocol work in `email_recovery/typestate.rs` to their respective crates: - `dmarc::compute_outcome(dkim_domain, from_domain, dmarc_txt) -> DmarcOutcome` lives in `dmarc/mod.rs` now (with its unit tests alongside). - `dkim::run_signature_check(sig, headers, body, dkim_header_value, dkim_txt, now_secs) -> Result<Vec<DkimCheck>, …>` lives in `dkim/verify.rs` now. Takes primitive args (no typestate types), so the dkim crate has no upward dependency on email_recovery. Stage 3's `verify_signature` collapses to ~20 lines of glue: call `dkim::run_signature_check`, then `dmarc::extract_from_domain` for the From-domain that DMARC alignment needs. Stage 3's `try_from` calls `dmarc::compute_outcome` directly. The typestate file dropped ~160 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the sketch from the prior version of this commit. The stage-1
`UnverifiedSmtpRequest` no longer wraps a wire-shape `SmtpRequest`;
instead it carries an `Option<unverified::SmtpMessage>` whose RFC 5322
§3.6 single-occurrence headers are destructured into typed fields:
- `From` / `Date` / `Subject` → required `String` (exactly-once)
- `Sender` / `Reply-To` / `To` / `Cc` / `Bcc` / `Message-ID` /
`In-Reply-To` / `References` → `Option<String>` (at-most-once)
- `DKIM-Signature` → `Vec<String>` (at-least-once when message present)
- everything else → `other_headers: Vec<SmtpHeader>` in source order
Duplicate/missing checks for these headers are type-system invariants
now, enforced by `SmtpMessageBuilder` at stage-1 construction. The
wire-types crate's `validate_smtp_request` drops its
`validate_message_header_uniqueness` companion (and all its supporting
types/tests) — addresses the "why two separate methods?" review point
by going further: §3.6 belongs in the typestate, bounds belong in the
wire crate.
Stage 3 `VerifiedSmtpRequest` switches from `pub(crate)` fields to
`pub(crate)` accessor methods (`dkim_domain()`, `mail_from_domain()`,
`dmarc_outcome()`, `checks()`) — addresses sea-snake's
"avoid pub(crate) fields" point.
Canister boundary (`handle_smtp_request`) restructured: recipient
dispatch reads the wire envelope directly *before* the typestate
conversion. Misdirected mail therefore returns 550 ("No such user
here") regardless of message shape — fixes the CI integration test
`smtp_request_rejects_emails_addressed_to_unknown_recipients_with_550`
that was hitting 555 under the prior ordering.
DKIM math (`crate::dkim::run_signature_check`) keeps its primitive
wire-shape interface; the typestate reconstructs a wire-header vec
from the typed message via `reconstruct_wire_headers`, exposed
`pub(crate)` so the DNSSEC partial-verification path can use the same
reconstruction the DoH path's stage 3 uses.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves a conflict with #3948 by **reverting** main's wire-types §3.6 helpers and consolidating the enforcement inside the canister-side typestate. The two crates land on opposite sides of the split: wire-types stays bounds-only (`validate_envelope`, `validate_message`, `validate_smtp_request`), the typestate owns §3.6 well-formedness through `unverified::SmtpMessageBuilder`. Stage shape: - `unverified::SmtpRequest` carries an optional `SmtpMessage` whose §3.6 single-occurrence headers occupy typed slots (`From`/`Date` required, `Subject`/`Sender`/`Reply-To`/`To`/ `Cc`/`Bcc`/`Message-ID`/`In-Reply-To`/`References` optional). Everything else, including `DKIM-Signature`, lives in `other_headers` in source order — DKIM isn't §3.6, so stage 1 doesn't route it specially. - `signed::SmtpRequest` reflects "≥1 parsed signature" via a `first + rest` shape rather than a wrapped `Vec`. - `verified::SmtpRequest` lives in its own submodule for symmetry. Errors switch to Rust enums (`SmtpError`/`EnvelopeError`/ `MessageError` for stage 1, `DkimScopeError` for stage 2) with a `From<…> for SmtpResponse` adapter at the canister boundary. Duplicate-§3.6 detection aggregates every offender into a single `DuplicateUniqueHeaders(Vec<&'static str>)` so the SMTP 555 response can name them all in one round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Merged Each stage's inner type now reflects what got checked:
Errors are Rust enums ( |
Picks up `message_id: Option<String>` from #3963 (wire-types crate's `SmtpRequest` + 256-byte bounds check) and threads `message_id: None` through every test-fixture `SmtpRequest` literal. The typestate ignores the new field via `message_id: _` in its `WireSmtpRequest` destructure — recovery flows don't consume it, mirroring the existing handling of `gateway_flags`. `dkim/verify.rs` and `dmarc/test_vectors.rs` conflicts were the same shape as last merge: main updated old `verify()`-based tests that we've already rewritten on top of the typestate API. Took our typestate-based versions; added `message_id: None` to the surviving fixture literals. `dmarc/verify.rs` (modified in main, deleted in HEAD) stays deleted — its responsibilities moved into `dmarc::mod` as part of the typestate refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up: - #3987 DoH cache fix — supersedes the in-flight-waiter shape with a yield-and-poll loop bounded by `MAX_DEDUP_WAIT_POLLS`. Took main's version; ours was an earlier attempt at the same fix. - #3980 `MAX_HEADERS` raised 30 → 100 — Outlook/Exchange routinely forward 40+ trace headers. Rewrote the cap-acceptance test to use inline `SmtpHeader` literals (it called helpers I had removed alongside `validate_header_occurrences`). - #3972 `message_id` propagation onto the pending-challenge snapshot in `handle_smtp_request` — clone the field out of the wire request before stage-1 consumes it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…C/DoH branches `UnverifiedSmtpRequest::try_from` is now the very first statement in both `handle_smtp_request` and `handle_smtp_request_validate` — a malformed shape (missing envelope, bounds, RFC 5322 §3.6) gets 555 before any dispatch or business logic. To make that possible: - `unverified::SmtpRequest` carries the bounds-validated envelope and the optional `message_id`. Stage 1 fully subsumes the wire-types `validate_smtp_request`; a new `validate_message_id` helper in the wire-types crate holds the 256-byte cap so the rule lives in one place. Adds `SmtpError::MessageId(SmtpResponse)`. - `handle_smtp_request`'s tail is split into `complete_dnssec_path` (sync, stashes a partial-verification record), `complete_doh_path` (async, finishes inside the one call), and `finalize_doh_success` (binds the credential or stamps the recovery delegation). - Both entry points are now thin wrappers over inner functions returning `Result<SmtpResponse, SmtpResponse>` so `try_from` uses `?` instead of a nested `match` and every reject site is a `return Err(smtp_err(...))`. The outer wrapper merges Ok/Err via `unwrap_or_else(|r| r)`. Behavioural change: a malformed message addressed to an unknown mailbox now answers 555 (syntax) instead of 550 (no such user). The `…_550` integration test sends well-formed mail so it still gets 550. All canister-bin and wire-types unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconciles the typestate pipeline with #3992 (explicit DoH fallback) and #3995 (single-flight cache), which restructured the same flow this PR refactors. Main's architecture wins on flow; the typestate wins on how stage 1 is expressed: - `handle_smtp_request` is sync again — with #3992 both paths stash a `PartialVerification` and return; the DoH path no longer finishes inline. Status forks to `NeedDkimLeaf { selector }` (DNSSEC) or `ResolvingDoh` (DoH), and the FE drives completion via `email_recovery_submit_dkim_leaf` / `email_recovery_resolve_via_doh`. - `prepare_partial_verification` keeps the typestate signature (consumes `UnverifiedSmtpRequest`, picks the bottom-most signature anchored in the registered zone) and now serves both paths, as on main. - `verify_setup_email_doh`, `complete_dnssec_path`, `complete_doh_path` and `finalize_doh_success` are deleted — superseded by main's partial-based completion in `submit_leaf::resolve_via_doh`, which calls this module's `map_doh_error` / `bind_credential` / `stamp_recovery_delegation` / `recovery_snapshot`. - Stage-1-first entry points (inner-fn + `?` over `try_from`) survive unchanged; main's inline `validate_smtp_request` calls stay subsumed by the typestate conversion. - Kept our `parse_from_address(&str)` over main's `extract_from_address(&SmtpMessage)` — the typed message exposes the single-occurrence From value directly, and nothing else referenced the wire-shape variant. Stage 3 (`VerifiedSmtpRequest`) is no longer on the production path — completion now runs off the stashed partial — but remains exercised by the dkim/dmarc test-vector suites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Merged
One consequence worth noting: stage 3 ( |
|
✅ No security or compliance issues detected. Reviewed everything up to 2140c32. Security Overview
Detected Code ChangesThe diff is too large to display a summary of code changes. |
Summary
SmtpRequestthrough DKIM and DMARC verifiers and trusted each layer to re-extract what it needed; nothing in the type system distinguished a just-arrived request from one that had passed RFC structural checks, DKIM-certification, or DMARC alignment. This refactor introduces a three-stage typestate (UnverifiedSmtpRequest→Vec<SignedSmtpRequestProjection>→VerifiedSmtpRequest) so the trust level is encoded in the type. Downstream code now reads message content viaheader()/body()on a verified value only — no path can hand a partially-checked request to logic that assumes verification ran.From/Date/Subject, at-most-onceSender/Reply-To/To/Cc/Bcc/Message-ID/In-Reply-To/References, at-least-onceDKIM-Signaturewhen a message is present). Wire-shape problems now surface at the canister entry as SMTP 555 before any cryptographic work runs.d=matches the relevant domain.dkim::verify::verifyanddmarc::verify::verify_email(plus theDkimVerifyResult/EmailVerificationStatusresult enums) are deleted; their per-signature DKIM math and DMARC-alignment helper live in the typestate'sverify_one_signatureandcompute_dmarc_outcome— single source of truth, no duplication.Tests
email_recovery/typestate.rscover stage-1 RFC checks (exactly-once / at-most-once / at-least-once enforcement, case-insensitive name match, envelope-only acceptance, bounds propagation), stage-2 (per-signatureVec, aggregated diagnostics on all-fail, public surface limited tosignature_domain/selector/algorithm), stage-3 (per-signature error aggregation, bottom-up iteration order), and thecompute_dmarc_outcomealignment matrix lifted out of the deleted dmarc/verify.rs.dkim/test_vectors.rsanddmarc/test_vectors.rsmigrated to drive the typestate end-to-end against the existing synth-RSA.emlfixtures.dkim/verify.rs's 7 inline tests rewritten to assert onVerificationError.last_reason; the 7blank_b_tag_valueunit tests stay alongside the primitive itself.cargo test -p internet_identity --bin internet_identity→ 549 passed.cargo clippy --all-targets -D warningsclean (no#[allow(clippy::...)]decorators added).cargo build --release --target wasm32-unknown-unknownsucceeds.