Merkle Spine

A formally verified library for canonical, multi-algorithm Merkle structures: one tree topology shared across many hash algorithms at once, with a single canonical form per layout. The repository is a four-tier stack of library crates — a structural core, two single-algorithm engineering libraries, a combinator that lifts them across algorithms, and the concrete instantiations. The full design is in docs/architecture.md.

The stack

The crates are layered, and the stack gets more concrete going up: the Spine is the most abstract (a generic Merkle structure, infinitely reusable, with no notion of epochs); each tier up is more concrete, ending in named, k-fixed instantiations.

┌──────────────────────────────────────────────────────────────────────────────┐
│ L4 — EML / EMT — the instantiations (where "Epoch" lives)           eml/emt  │
│   EML = polydigest(CML) @ k=2, arbitrary subtrees — the general-purpose log    │
│   EMT = polydigest(CMT) @ k=2 — the mutable peer                               │
└───────────────────────────────┬────────────────────────────────────────────────┘
                                 │  instantiate at k=2
┌──────────────────────────────────────────────────────────────────────────────┐
│ L3 — the combinator                                             polydigest    │
│   lifts a CML or CMT across N algorithms over ONE shared data substrate:        │
│   activation timeline · null-run-extents · binding root · binding proof         │
└───────────────────────────────┬────────────────────────────────────────────────┘
              ┌──────────────────┴───────────────────┐
┌─────────────▼──────────────────┐      ┌─────────────▼──────────────────┐
│ L2 — CML (Canonical Merkle Log) │      │ L2 — CMT (Canonical Merkle Tree)│
│  append-only, single-algorithm  │      │  mutable, single-algorithm      │
│  frontier carry · consistency   │ cml  │  set / get; no consistency  cmt │
│  seal → Snapshot · filling base │      │  per-node multi-hash add        │
└─────────────┬──────────────────┘      └─────────────┬──────────────────┘
              └──────────────────┬───────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ L1 — Merkle Spine — the structural core; the proof payoff            spine    │
│   canonicalization (collapse + promotion) · the proof spine · the Hasher seam  │
│   inclusion · leaf proof · the general structural Seal · opaque metadata       │
│   depends on nothing · epoch-free                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Merkle Spine (spine) is the abstract structural core: canonicalization, the proof spine, the Hasher seam, inclusion and leaf proofs, the general structural Seal, and an opaque metadata channel. It is epoch-free and depends on nothing. The structural uniqueness theorem is the central proof payoff.

CML (cml) and CMT (cmt) are the two single-algorithm engineering libraries over the Spine. CML — Canonical Merkle Log — is append-only: it owns the frontier carry, append-only-prefix consistency proofs, and the structural snapshot. CMT — Canonical Merkle Tree — is the mutable peer: set / get and per-node multi-hash, but no frontier and so no consistency proofs. Both are k-generic and epoch-free; neither depends on the other. The one currency they exchange is the spine's Seal.

polydigest is the combinator that lifts a CML or CMT across N algorithms over one shared data substrate. It is not a composition of independent logs — that would duplicate the data and raise per-algorithm cost. The leaf entries live once; each algorithm is a frontier / hash-view over them, so its only per-algorithm cost is its own hashing. The combinator adds the three concepts the structural core deliberately omits: the activation timeline (which algorithm is live where), the null-run-extents (the one logical count, for cross-tree alignment), and the binding root (the atomic multi-tree commitment).

EML / EMT (eml / emt) are the concrete instantiations, fixed at arity k = 2. EML — the Epoch Merkle Log, polydigest(CML) — is the general-purpose append-only log. EMT — the Epoch Merkle Tree, polydigest(CMT) — is the mutable peer. k = 2 is a sane default, not an opinion: binary spine traversal is cheaply logarithmic.

The library is de-branded. The deliverable is the crates; the instantiations are thin. A consumer composes its own application structures over EML/EMT — for example a mutable outer tree with an embedded append-only commit log, joined by the spine's opaque subtree embedding; a CT-style build can wrap any Hasher with RFC 9162 prefix bytes — but no consumer-specific crate lives in the library namespace.

Key design points

One topology, many algorithms. Every hash algorithm sees the same tree shape; each computes its own digests independently. Adding or removing an algorithm never touches another's roots or proofs.

Canonicalization. Two primitives — promotion (a lone child lifts in place of its hashed node) and collapse (equal-valued siblings fold to one) — compose into a single confluent reduction to a unique canonical form. Collapse is general, not null-specific: null is just one value that collapses. This pins each layout to exactly one root and closes proof malleability. Both primitives are structural — they reduce the count, tracking nothing.

The two-count model. The structural tiers (Spine / CML / CMT) carry the bare reduced count and commit nothing. The only thing ever counted is the null-run-extent, retained solely for cross-tree alignment — and it lives only at the polydigest combinator, because it is the one collapse instance that diverges per algorithm. Everything follows from this boundary.

The binding root. Per algorithm, BR_i is the same canonicalization fold applied one level up, over the per-algorithm member roots as opaque children (plus the all-algorithms null-run-extents when activation is non-trivial). Others' roots enter only as opaque digests — no security material is mixed across algorithms, so no algorithm's break can weaken another. A single-algorithm, open-from-genesis structure folds to a plain root with no overhead.

Permanent / ephemeral hashes. The spine decomposes a log into a frontier of perfect k-ary subtrees. A complete subtree's hash is permanent — it appears in every larger tree and never changes — while the frontier-fold is ephemeral, recomputed per size. Only permanent hashes appear in proofs, so an append never mutates a proven hash. This is the precise "maximally immutable" semantics, and it lets the storage/serving layer stay fully untrusted (the CT / Trillian model).

Stable-coordinate addressing. A node is addressed by a coordinate invariant to canonicalization and to the hash — the leaf sequence number, or a complete-subtree (level, index) — and the structural path is derived, never stored. A from-root child-index path is not used: it churns under promotion/collapse and diverges across the per-algorithm trees.

Proof non-malleability. A zero-sibling step would represent a promoted node; honest provers omit it and verifiers reject it. A fixed (leaf, index, size, root) tuple admits at most one accepting path, modulo a hash collision.

Formal guarantees

The structural core and the combinator are verified in Lean 4 (proofs/lean/; see proofs/lean/README.md for the reviewer's guide). The verification has two properties:

Sorry-free. No sorry placeholder appears anywhere in the proof corpus.
At most four structural axioms. The trusted computing base is declared in proofs/lean/EMLProof/Foundations.lean: an abstract digest type Digest, its non-emptiness, the abstract hash H, and the serializer digestToBytes, plus Lean's own built-ins. Collision resistance is not an axiom — it is discharged as an explicit hypothesis at each use site.

The proof split mirrors the code split: the structural theorems carry no epoch hypothesis and sit at the Spine / CML layer; the binding theorems consume roots as opaque digests and sit at the polydigest layer.

Theorem	Layer	Statement
`canonical_unique`	Spine	A structure reduces to a unique canonical normal form
`kary_inclusion_soundness`	Spine	An accepting canonical proof commits the leaf at the claimed position
`inclusion_proof_unique`	Spine	At most one accepting canonical path per statement (modulo collision)
`leaf_proof_sound`	Spine	A leaf proof soundly binds a leaf hash to its trusted parameters
`consistency_soundness`	CML	An accepting consistency proof forces the reconstructed old root to the genuine prefix root
`consistency_append_only`	CML	Lifts consistency soundness to the data-level append-only relation
`combinedRoot_binds_timeline`	polydigest	A fixed binding root pins the member-digest list and binds the timeline
`coupling_extract_sound`	polydigest	Coupling opens the binding root soundly
`binding_root_sound`	polydigest	Each algorithm's binding root folds under its own hash
`binding_proof_consistent`	polydigest	Mutually consistent trusted binding roots prove agreement on the shared structure
`snapshot_proof_sound`	polydigest	A snapshot proof soundly aggregates leaf proofs against a sealed commitment

Conformance

The log uses MMR durable witnesses: each leaf's inclusion path to its peak is permanent and extends append-only. Inclusion and binding-root verification are preserved from the prior design; the consistency proof is upgraded to the MMR prefix-form. The conformance oracle is the Lean corpus and the durability property tests (eml/tests/proptests.rs, polydigest/tests/), which assert witness permanence across appends.

Building and testing

Rust workspace:

cargo test --workspace                   # all unit, property, and fault-injection tests
cargo test --release --test complexity   # complexity regression (release profile)

Lean proofs (from proofs/lean/):

lake build

Fuzz targets (nightly Rust):

cargo +nightly fuzz run <target>
# targets: verify_inclusion, verify_consistency, rehydrate_proof,
#          proof_mutation, state_machine

License

Distributed under an interim license that permits non-commercial, personal, academic, or research use. Commercial use is prohibited. See LICENSE for the complete terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Merkle Spine

The stack

Key design points

Formal guarantees

Conformance

Building and testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 510 Commits
cml		cml
cmt		cmt
docs		docs
eml		eml
emt		emt
examples		examples
fuzz		fuzz
polydigest		polydigest
proofs/lean		proofs/lean
spine		spine
storage-fjall		storage-fjall
tests		tests
.envrc		.envrc
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml
shell.nix		shell.nix

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Merkle Spine

The stack

Key design points

Formal guarantees

Conformance

Building and testing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages