program.py: route from_bytes_backrefs through deser_auto by richardkiss · Pull Request #776 · Chia-Network/clvm_rs

richardkiss · 2026-05-04T23:49:37Z

Summary

Program.from_bytes_backrefs was added in #708 but its implementation
delegated to CLVMTree.from_bytes → deserialize_as_tuples, a
classic-format-only parser. The method therefore rejected any blob that
actually contained a 0xfe backref byte, despite its name.

This PR routes from_bytes_backrefs through the Rust deser_auto
binding (after the existing serde_2026 prefix rejection), which correctly
handles both classic and backrefs wire formats.

Test plan

New regression test that round-trips a backrefs-encoded blob (asserts
the blob contains at least one 0xfe byte) through
Program.from_bytes_backrefs.
Existing test_repr_clvm_tree updated to use
Program.from_bytes_with_cursor (the actual CLVMTree-producing
entry point).
pytest wheel/python/tests/test_serialize.py passes locally.
cargo fmt --check and cargo clippy -- -D warnings clean.

Notes

Targets serde_2026 because the broken method only exists on that
branch (introduced in 2c51d14). Should land before [CHIA-3823] 2026 serde format #708 merges to
main.
Reported by Cursor bot.

Made with Cursor

Note

Low Risk
Low risk: small deserialization routing change plus targeted regression tests; behavior is narrowed only by continuing to reject serde_2026 inputs.

Overview
Program.from_bytes_backrefs now routes through the Rust deser_auto binding (while still rejecting the serde_2026 magic prefix), so it can successfully parse both classic and actual backrefs wire format blobs.

Tests are updated/extended to reflect this: test_repr_clvm_tree uses from_bytes_with_cursor for CLVMTree-level assertions, and a new regression test round-trips a backrefs-encoded blob and asserts it contains a 0xfe backref opcode.

^{Reviewed by Cursor Bugbot for commit df894a9. Bugbot is set up for automated code reviews on this repo. Configure here.}

Implement an interning-based serialization format for CLVM trees that deduplicates atoms and pairs for better compression than backrefs, with comparable deserialization speed and much faster serialization. Format: 6-byte magic prefix (fd ff 32 30 32 36), atom table grouped by length, stack-based instruction stream with dual-cons opcodes. Uses a variable-length signed integer (varint) encoding throughout. Includes: - varint codec (encode/decode, signed, variable-length prefix scheme) - deserializer with DeserializeLimits (max_atom_len, max_input_bytes) - serializer (left-first traversal) and pair-optimized serializer (DP) - serialized_length_serde_2026: compute blob length without deserializing - node_from_bytes_auto: auto-detect classic/backrefs/serde_2026 - InternedTree::node_indices() for serialization support - format specification (docs/serde-2026.md) - benchmarks for serialize and deserialize - 654 lines of tests (41 test cases) Made-with: Cursor

Rust pyo3 bindings: - deser_legacy, deser_backrefs, deser_2026, deser_auto (bytes -> LazyNode) - ser_legacy, ser_backrefs, ser_2026 (LazyNode -> bytes) - clvm_tree_to_lazy_node (Python CLVM tree -> interned Rust LazyNode) Python layer: - clvm_rs.serde module: serialize()/deserialize() with format selection - Program.from_bytes() now auto-detects format via deser_auto - Program.to_bytes_serde_2026() / from_bytes_serde_2026() - Updated .pyi stubs (fix phantom NO_NEG_DIV, add new flags/functions) - 222 lines of Python tests Made-with: Cursor

Roundtrip fuzzer: deserialize arbitrary bytes as classic CLVM, serialize to serde_2026, deserialize back, and compare trees structurally. Uses serialize_2026 (not node_to_bytes) for comparison to handle DAG-vs-tree differences correctly. Made-with: Cursor

Simplify serde_2026 deserialization limits around byte and atom-size budgets, add strict varint decoding, and update docs/tests/fuzzing for the refined API. Made-with: Cursor

Keep the serde_2026 serializer on the left-first strategy for review while preserving the strategy hook for a future context-aware compact implementation. Made-with: Cursor

Update the fuzz target and Python tests to reflect the remaining fast strategy after compact was removed from the PR. Made-with: Cursor

…26 prefix - Add a `decode_varint` fuzz target (`serde-2026-varint`). Exercises the non-strict and strict decoders; checks that any value the non-strict path accepts re-encodes to a canonical (minimal) form which decodes identically in strict mode and is no longer than the original input. - Re-export `decode_varint` / `encode_varint` (with `#[doc(hidden)]`) so the fuzzer can call them through the public crate surface. - Add a unit test for `write_atom_table` that confirms atoms of equal length are coalesced into a single repeated-length group while atoms of different lengths are emitted as separate groups. - Fix `deser_2026`: it now strips `SERDE_2026_MAGIC_PREFIX` before calling the underlying decoder, so it round-trips with `ser_2026` (which always emits the prefix). A `ValueError` is raised if the prefix is missing. Adds a Python test for the symmetry property. Made-with: Cursor

Public serialization API now uses `level: u32` rather than the `Compression` enum. New entry points: - `serialize_2026(a, n)` / `serialize_2026_level(a, n, level)` - `serialize_2026_to_stream(a, n, w)` / `..._level(a, n, level, w)` - `node_to_bytes_serde_2026(a, n)` / `..._level(a, n, level)` Levels above the highest implemented level *saturate* down to it. This means callers can pass `u32::MAX` to mean "best available compression" without recompiling when new levels are added later. `Compression` becomes a `pub(crate)` internal vocabulary mapped from `level` by `compression_for_level`. Python `ser_2026(node, level=K)` no longer raises for `K > 0`; it produces the same bytes as level 0 today (and the best implemented level tomorrow). The "rejects unknown level" test is replaced with a saturation test that asserts levels 0, 1, 7, 1<<20 and u32::MAX all produce the same bytes today. All internal call sites (tests, benches, fuzz target, wheel API) updated. Made-with: Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>

…_count `Vec::with_capacity(instruction_count / 3)` allocated based on the declared instruction-count varint. A tiny blob (under 16 bytes) declaring instruction_count = 2^54 caused malloc to be called for ~24 PB and the process to abort with SIGABRT. `LimitReader` already bounds total bytes consumed, so the loop below cannot drive `pairs` past max_input_bytes/3 entries — geometric growth from `Vec::new()` handles that with a handful of reallocs and removes the abort surface. Adds three regression tests covering the same vector via group_count, the per-group atom count, and instruction_count: each crafts a tiny blob with a huge declared count and asserts deserialize errors instead of aborting. Made-with: Cursor

…t API Move bounds policy out of clvm_rs. The deserializer now takes (max_atom_len, strict) directly; total input bytes is bounded by the slice length (or, for the streamed variant, a caller-supplied `Read::take`). LimitReader and the DEFAULT_MAX_* constants are gone — those defaults felt consensus-flavored without actually being consensus, and they belong in the wrapper that *does* care (chia_rs). `node_from_bytes_auto` was the only Rust caller of those defaults, so it goes too. Auto-detect (sniff the magic prefix and dispatch to backrefs or serde_2026) survives as a Python-only convenience: deser_auto in the wheel sniffs the prefix in Rust and dispatches. clvm_rs's Python tools keep their "works on any blob" feel; consensus-aware callers sniff themselves and supply their own caps. Wire-up: - de.rs: drop DeserializeOptions/LimitReader/DEFAULT_MAX_*; new signatures - mod re-exports: drop DeserializeOptions, node_from_bytes_auto - debug self-check: hard-codes 1 MiB max atom (matches old default) - wheel: deser_2026 / deser_auto take max_atom_len + strict; default 1 MiB - tests: each test/fuzz/bench picks a local MAX_ATOM_LEN; Rust auto tests removed (function is gone, Python tests cover it) Made-with: Cursor

The Rust deserializer family was lopsided: serialize_2026 -> body, no prefix node_to_bytes_serde_2026 -> body + prefix deserialize_2026 -> body, no prefix (name didn't say so) deserialize_2026_from_stream -> body, no prefix (same problem; worse for streams since you can't pre-check the prefix) serialized_length_serde_2026 -> wants prefix (no node_from_bytes_serde_2026) This commit: - Renames `deserialize_2026{,_from_stream}` -> `deserialize_2026_body{,_from_stream}` so the body-only contract is visible at the call site. - Adds `node_from_bytes_serde_2026(a, blob, max_atom_len, strict)` — strips the magic prefix and delegates to the body deserializer. Pairs one-to-one with `node_to_bytes_serde_2026`. - Parametrizes `serialized_length_serde_2026(buf, max_atom_len, strict)` so it mirrors every header-time validation the body deserializer performs (closes the Ok-then-Err mismatch on `instruction_count == 0`, `atom_len > max_atom_len`, and overlong varints under `strict`). - Wheel: `deser_2026` now calls `node_from_bytes_serde_2026` directly instead of stripping the prefix in Python-binding glue. - Self-check, tests, fuzz target, benches: switch to the body name. Includes a regression test (`test_serialized_length_rejects_what_deserialize_rejects`) asserting the length helper rejects every header-time condition the body deserializer rejects. Made-with: Cursor

Cursor Bugbot caught that test_serialized_length_rejects_what_deserialize_rejects was calling `deserialize_2026_body` with a magic-prefixed blob from the `mk` helper. The body deserializer treats the leading 0xfd as a giant negative varint and rejects every input regardless of the header-time condition under test, making the first assertion vacuously pass. Switch the deserializer call to `node_from_bytes_serde_2026` so the test actually exercises the conditions it claims to (instruction_count == 0, group length == 0, multi-atom group count == 0). Made-with: Cursor

The previous implementation went through CLVMTree.from_bytes -> deserialize_as_tuples, which is a classic-format-only parser. As a result, Program.from_bytes_backrefs("...") raised ValueError on any blob that actually used the 0xfe backref opcode, despite the method's name implying backref support. After this change, from_bytes_backrefs: - still rejects the serde_2026 magic prefix up front - delegates to the Rust deser_auto binding, which handles both classic and backrefs formats Adds a regression test that round-trips a tree-with-shared-subtrees through the backrefs encoder and Program.from_bytes_backrefs (asserting the blob contains at least one 0xfe backref byte). The existing test_repr_clvm_tree case is updated to use Program.from_bytes_with_cursor, which is the actual CLVMTree-producing entry point. Reported by Cursor bot on PR #708. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR fixes Program.from_bytes_backrefs in the Python wheel so it actually supports backrefs-encoded CLVM (i.e., blobs containing 0xfe backref markers), by routing deserialization through the Rust deser_auto binding after continuing to reject the serde_2026 magic prefix.

Changes:

Update Program.from_bytes_backrefs to use the Rust auto-deserializer (classic/backrefs) instead of the classic-only CLVMTree.from_bytes path.
Update test_repr_clvm_tree to use Program.from_bytes_with_cursor, which is the CLVMTree-producing entry point.
Add a regression test ensuring from_bytes_backrefs can round-trip an actually-backrefs-encoded blob (containing at least one 0xfe).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
wheel/python/clvm_rs/program.py	Fixes `from_bytes_backrefs` to correctly parse backrefs (while still rejecting `serde_2026` by prefix).
wheel/python/tests/test_serialize.py	Adjusts a CLVMTree-specific test and adds a regression test for true backrefs parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    def from_bytes_backrefs(cls, blob: bytes) -> Program:
        """Deserialize classic or backrefs format only (rejects serde_2026)."""
        if blob.startswith(SERDE_2026_MAGIC_PREFIX):
            raise ValueError("unexpected serde_2026 format; use from_bytes() for auto-detection")
-        obj, cursor = cls.from_bytes_with_cursor(blob, 0)
-        return obj
+        return cls.wrap(deser_auto(blob))


+
+        # Serialize using backrefs format
+        blob = serialize(deserialize(bytes(p), "legacy"), "backrefs")
+
+        # Assert the blob contains at least one 0xfe backref opcode
+        self.assertIn(0xfe, blob, "backrefs-encoded blob should contain at least one 0xfe byte")
+
+        # Round-trip through from_bytes_backrefs
+        p2 = Program.from_bytes_backrefs(blob)
+


coveralls-official · 2026-05-05T00:03:08Z

Coverage Report for CI Build 25349903018

Warning

No base build found for commit 6ac4265 on serde_2026.
Coverage changes can't be calculated without a base build.
If a base build is processing, this comment will update automatically when it completes.

Coverage: 87.971%

Details

Patch coverage: 1 of 1 lines across 1 file are fully covered (100%).

Uncovered Changes

No uncovered changes found.

Coverage Regressions

Requires a base build to compare against. How to fix this →

Coverage Stats


Relevant Lines:	9394
Covered Lines:	8264
Line Coverage:	87.97%
Coverage Strength:	28088109.57 hits per line

💛 - Coveralls

richardkiss and others added 15 commits April 30, 2026 11:45

serde_2026 fuzz target

2c51d14

Roundtrip fuzzer: deserialize arbitrary bytes as classic CLVM, serialize to serde_2026, deserialize back, and compare trees structurally. Uses serialize_2026 (not node_to_bytes) for comparison to handle DAG-vs-tree differences correctly. Made-with: Cursor

serde_2026: refine deserialization options

b67e865

Simplify serde_2026 deserialization limits around byte and atom-size budgets, add strict varint decoding, and update docs/tests/fuzzing for the refined API. Made-with: Cursor

serde_2026: remove compact strategy from PR

412eaf1

Keep the serde_2026 serializer on the left-first strategy for review while preserving the strategy hook for a future context-aware compact implementation. Made-with: Cursor

serde_2026: fix compact removal CI fallout

2104989

Update the fuzz target and Python tests to reflect the remaining fast strategy after compact was removed from the PR. Made-with: Cursor

Remove unnecessary crate exports

e1f1c9c

Revert stray .gitignore newline (per Arvid review)

2ced494

Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI review requested due to automatic review settings May 4, 2026 23:49

Copilot started reviewing on behalf of richardkiss May 4, 2026 23:50 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

richardkiss force-pushed the serde_2026 branch from 2f80041 to 5ddec5f Compare May 7, 2026 19:54

Base automatically changed from serde_2026 to main May 19, 2026 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

program.py: route from_bytes_backrefs through deser_auto#776

program.py: route from_bytes_backrefs through deser_auto#776
richardkiss wants to merge 15 commits into
mainfrom
fix-program-py-from-bytes-backrefs

richardkiss commented May 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coveralls-official Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

richardkiss commented May 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coveralls-official Bot commented May 5, 2026

Coverage Report for CI Build 25349903018

Coverage: 87.971%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richardkiss commented May 4, 2026 •

edited by cursor Bot

Loading