Skip to content

Improvements to show, and new show from disc#4911

Draft
mariaKt wants to merge 6 commits intodevelopfrom
mk/show-from-disk-2
Draft

Improvements to show, and new show from disc#4911
mariaKt wants to merge 6 commits intodevelopfrom
mk/show-from-disk-2

Conversation

@mariaKt
Copy link
Copy Markdown
Collaborator

@mariaKt mariaKt commented Apr 20, 2026

Memory-efficient proof display: streaming show and lazy loading

Motivation

Displaying large proofs with kmir show caused out-of-memory failures. A completed 1411-node proof required 23.8 GB of memory just to print its output, after the proof itself had already finished successfully.

Investigation

The initial hypothesis was that the memory cost came from joining the entire output into a single string before printing ('\n'.join(lines)). Streaming the output line-by-line (via a generator) eliminated this duplication, but peak memory remained at 23.8 GB. The real cost was loading the entire proof into memory, parsing all node CTerms and constraint data from JSON into Python K-term objects, with ~6× memory overhead from Python object representation.

Changes

This PR introduces two improvements:

1. Generator-based show_iter

KCFGShow.show_iter and APRProofShow.show_iter yield lines one at a time instead of collecting them into a list. The existing show methods are reimplemented as list(self.show_iter(...)) to avoid code duplication. Tests confirm same output.

2. Lazy-loading show_iter_from_disk

APRProofShow.show_iter_from_disk displays a proof without loading the full proof into memory:

  • LazyNode — duck-types KCFG.Node. Stores node ID, attrs, and a file path. Loads the CTerm from nodes/{id}.json only when .cterm is accessed for printing.
  • LazyCSubst — duck-types CSubst. Holds the raw JSON dict and defers parsing into K-term objects until .constraints or .subst is accessed.
  • APRProofStub — duck-types APRProof. Holds proof metadata (init, target, terminal, bounded, refutations) from proof.json and answers proof-level queries (is_init, is_target, is_terminal, etc.) without loading the KCFG.

These stubs are passed into the real KCFG data structures (edges, splits, covers, ndbranches accept them via duck typing) and the existing pretty_segments traversal works unchanged, with no reimplementation or duplication of the tree-drawing logic.

Loading is integrated into the existing code path: KCFG.from_dict accepts a lazy parameter that controls whether nodes are created as LazyNode stubs and covers use LazyCSubst wrappers. KCFGStore.read_cfg_data_lazy reads kcfg.json without loading node JSON files. KCFG.read_cfg_data_lazy provides the entry point, combining the two.

Results

Tested on a 1411-node p-token proof (burn_multisig_n1):

show_iter (full load) show_iter_from_disk (lazy)
Peak memory 23.8 GB 6.3 GB
Time 4263s (~71 min) 315s (~5 min)
Output lines 10,221 10,221

74% memory reduction, 13.5× faster, identical output.

Testing

  • show_iter_from_disk output is compared against show output in three existing test_imp.py test functions (all tests that write proofs to disk). Both default and full-printer modes are exercised.
  • Manually verified output match on external proofs with APRProofNodePrinter (proof-level attrs: init, target, terminal).
  • All 98 existing pyk show-related tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant