ZIPT Code Review: Improvements from Reference Implementation
Date: 2025-07-22
Files Reviewed: src/ast/rewriter/seq_eq_solver.cpp, src/ast/rewriter/seq_regex_bisim.cpp, src/smt/seq_eq_solver.cpp, src/smt/theory_seq.cpp
ZIPT Reference: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT
Note: The originally targeted euf_snode.h / euf_sgraph.h / euf_seq_plugin.* files do not yet exist in this repository. The review was re-scoped to the existing sequence solver implementation.
Summary
Three concrete improvements were identified by comparing Z3's sequence-equation solver with the ZIPT reference (NielsenNode.cs, NielsenGraph.cs, StrEqBase.cs). Two are in seq_eq_solver.cpp (variable shadowing and a tricky unsigned reverse-iteration idiom), and one is in seq_regex_bisim.cpp (moving step-bound accounting inside the leaf-traversal loop so deep derivative trees are properly bounded).
Improvements Applied
Improvement 1: Fix variable shadowing of eq_ptr& r output parameter in reduce_itos3 and reduce_ubv2s2
File: src/ast/rewriter/seq_eq_solver.cpp
Functions: reduce_itos3 (lines ~162–185), reduce_ubv2s2 (lines ~269–300)
Rationale: Both functions have an eq_ptr& r output parameter (used to pass back a derived equation). Inside each function, a range-for loop introduced for (expr* r : es), shadowing the output parameter. While the parameter happens not to be written inside those loops today, the shadowing creates a latent maintenance hazard: any future code that tries to set the output r inside the loop would silently write to the local copy instead. Renamed the loop variable to elem in all four loops (two functions × two loops each).
ZIPT Reference: ZIPT's StrEqBase.cs Simplify() and GetConstraints() use consistent naming (elem, ch, c) for loop variables to avoid conflicts with outer-scope names.
Improvement 2: Replace do...while(i-- > 0) with idiomatic for (unsigned i = n; i-- > 0;) in reverse-iteration helpers
File: src/ast/rewriter/seq_eq_solver.cpp
Functions: count_units_r2l (lines ~497–506), count_non_units_r2l (lines ~514–523)
Rationale: The original do { ... } while (i-- > 0) pattern relies on unsigned integer underflow semantics. When i == 0, the post-decrement evaluates 0 > 0 = false before wrapping to UINT_MAX, so the loop terminates correctly. This is safe, but subtle: readers must know that unsigned wrapping is intentional and that the > 0 check happens before the wrap. The idiomatic replacement for (unsigned i = offset + 1; i-- > 0;) has identical semantics but makes the intent explicit — it is the established C++ idiom for safe reverse-iteration over unsigned indices. Same logic, more readable.
ZIPT Reference: ZIPT's NielsenNode.cs right-to-left passes (SimplifyRight(), ReduceRight()) iterate from Count - 1 downwards using standard for (int i = count - 1; i >= 0; i--) (C# uses signed indices, so underflow is not an issue). The Z3 port used do...while to replicate this semantic in unsigned C++; the for (i = n+1; i-- > 0;) form is the standard unsigned equivalent.
Improvement 3: Count steps inside collect_leaves to respect the bisimulation step budget
File: src/ast/rewriter/seq_regex_bisim.cpp
Functions: collect_leaves (lines ~98–122), are_equivalent_core (lines ~224–242)
Rationale: are_equivalent_core maintains a step counter (m_steps) bounded by m_step_bound (50 000) to prevent non-termination on pathological inputs. Previously, m_steps was incremented once per worklist item — at the top of the outer while (!m_worklist.empty()) loop. However, collect_leaves traverses the ITE/union tree of the Brzozowski derivative, which can be exponentially larger than the worklist. A derivative tree with depth d produces up to 2^d leaves, all processed by collect_leaves with no step accounting. This meant that the step budget only bounded the number of worklist iterations, not the total work done.
The fix moves ++m_steps > m_step_bound inside collect_leaves (one increment per node visited in the derivative tree), and changes the outer-loop check to m_steps > m_step_bound (no additional increment). The outer check now acts as a fast-exit guard at the start of each iteration when collect_leaves has already exhausted the budget; collect_leaves itself returns false on step exhaustion, causing are_equivalent_core to return l_undef.
ZIPT Reference: ZIPT's NielsenGraph.cs uses a DepthBound mechanism with ItDeepeningInc that limits work per expansion step rather than per top-level node. The collect_leaves traversal is the closest analog to ZIPT's per-expansion work. Bounding it per-node is consistent with ZIPT's philosophy of bounding total expansion work.
Git Diff
The following diff can be applied with git apply:
diff --git a/src/ast/rewriter/seq_eq_solver.cpp b/src/ast/rewriter/seq_eq_solver.cpp
index e1ffae7..9eb2e7e 100644
--- a/src/ast/rewriter/seq_eq_solver.cpp
+++ b/src/ast/rewriter/seq_eq_solver.cpp
@@ -162,8 +162,8 @@ namespace seq {
return true;
}
expr* u = nullptr;
- for (expr* r : es) {
- if (seq.str.is_unit(r, u)) {
+ for (expr* elem : es) {
+ if (seq.str.is_unit(elem, u)) {
expr_ref is_digit = m_ax.is_digit(u);
if (!m.is_true(ctx.expr2rep(is_digit)))
add_consequence(is_digit);
@@ -173,8 +173,8 @@ namespace seq {
return false;
expr_ref num(m);
- for (expr* r : es) {
- VERIFY(seq.str.is_unit(r, u));
+ for (expr* elem : es) {
+ VERIFY(seq.str.is_unit(elem, u));
expr_ref digit = m_ax.sk().mk_digit2int(u);
if (!num)
num = digit;
@@ -269,8 +269,8 @@ namespace seq {
return true;
}
expr* u = nullptr;
- for (expr* r : es) {
- if (seq.str.is_unit(r, u)) {
+ for (expr* elem : es) {
+ if (seq.str.is_unit(elem, u)) {
expr_ref is_digit = m_ax.is_digit(u);
if (!m.is_true(ctx.expr2rep(is_digit)))
add_consequence(is_digit);
@@ -288,8 +288,8 @@ namespace seq {
return true;
}
- for (expr* r : es) {
- VERIFY(seq.str.is_unit(r, u));
+ for (expr* elem : es) {
+ VERIFY(seq.str.is_unit(elem, u));
expr_ref digit = m_ax.sk().mk_digit2bv(u, bv_sort);
if (!num)
num = digit;
@@ -496,12 +496,12 @@ namespace seq {
unsigned eq_solver::count_units_r2l(expr_ref_vector const& es, unsigned offset) const {
SASSERT(offset < es.size());
- unsigned i = offset, count = 0;
- do {
+ unsigned count = 0;
+ for (unsigned i = offset + 1; i-- > 0;) {
if (!seq.str.is_unit(es[i]))
break;
++count;
- } while (i-- > 0);
+ }
return count;
}
@@ -513,12 +513,12 @@ namespace seq {
unsigned eq_solver::count_non_units_r2l(expr_ref_vector const& es, unsigned offset) const {
SASSERT(offset < es.size());
- unsigned i = offset, count = 0;
- do {
+ unsigned count = 0;
+ for (unsigned i = offset + 1; i-- > 0;) {
if (seq.str.is_unit(es[i]))
break;
++count;
- } while (i-- > 0);
+ }
return count;
}
diff --git a/src/ast/rewriter/seq_regex_bisim.cpp b/src/ast/rewriter/seq_regex_bisim.cpp
index 5e84901..8a100f7 100644
--- a/src/ast/rewriter/seq_regex_bisim.cpp
+++ b/src/ast/rewriter/seq_regex_bisim.cpp
@@ -91,7 +91,8 @@ namespace seq {
(re.empty) leaves are dropped.
Returns false if we encountered an unexpected node (e.g. a free
- variable creeping in) — in that case the caller should bail out.
+ variable creeping in) or the step budget was exhausted — in that
+ case the caller should bail out with l_undef.
*/
bool regex_bisim::collect_leaves(expr* der, expr_ref_vector& leaves) {
ptr_vector<expr> work;
@@ -99,6 +100,8 @@ namespace seq {
work.push_back(der);
seen.insert(der);
while (!work.empty()) {
+ if (++m_steps > m_step_bound)
+ return false;
expr* e = work.back();
work.pop_back();
expr* c = nullptr, * t = nullptr, * f = nullptr;
@@ -221,7 +224,7 @@ namespace seq {
m_worklist.push_back(r0);
while (!m_worklist.empty()) {
- if (++m_steps > m_step_bound)
+ if (m_steps > m_step_bound)
return l_undef;
expr_ref r(m_worklist.back(), m);
To apply:
git apply - << 'EOF'
[paste diff above]
EOF
Testing
After applying this diff, build and test with:
mkdir -p build && cd build
cmake ..
make -j$(nproc)
make test-z3
./test-z3 /a
The changes are pure refactors (improvements 1 and 2) and a conservative tightening of an existing bound (improvement 3); no behavioral changes are expected on any existing test cases.
Generated by ZIPT Code Reviewer agent — comparing Z3 implementation with CEisenhofer/ZIPT@parikh
Generated by ZIPT Code Reviewer · 1.3K AIC · ⌖ 38.4 AIC · ⊞ 28.8K · ◷
ZIPT Code Review: Improvements from Reference Implementation
Date: 2025-07-22
Files Reviewed:
src/ast/rewriter/seq_eq_solver.cpp,src/ast/rewriter/seq_regex_bisim.cpp,src/smt/seq_eq_solver.cpp,src/smt/theory_seq.cppZIPT Reference: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT
Summary
Three concrete improvements were identified by comparing Z3's sequence-equation solver with the ZIPT reference (
NielsenNode.cs,NielsenGraph.cs,StrEqBase.cs). Two are inseq_eq_solver.cpp(variable shadowing and a tricky unsigned reverse-iteration idiom), and one is inseq_regex_bisim.cpp(moving step-bound accounting inside the leaf-traversal loop so deep derivative trees are properly bounded).Improvements Applied
Improvement 1: Fix variable shadowing of
eq_ptr& routput parameter inreduce_itos3andreduce_ubv2s2File:
src/ast/rewriter/seq_eq_solver.cppFunctions:
reduce_itos3(lines ~162–185),reduce_ubv2s2(lines ~269–300)Rationale: Both functions have an
eq_ptr& routput parameter (used to pass back a derived equation). Inside each function, a range-forloop introducedfor (expr* r : es), shadowing the output parameter. While the parameter happens not to be written inside those loops today, the shadowing creates a latent maintenance hazard: any future code that tries to set the outputrinside the loop would silently write to the local copy instead. Renamed the loop variable toelemin all four loops (two functions × two loops each).ZIPT Reference: ZIPT's
StrEqBase.csSimplify()andGetConstraints()use consistent naming (elem,ch,c) for loop variables to avoid conflicts with outer-scope names.Improvement 2: Replace
do...while(i-- > 0)with idiomaticfor (unsigned i = n; i-- > 0;)in reverse-iteration helpersFile:
src/ast/rewriter/seq_eq_solver.cppFunctions:
count_units_r2l(lines ~497–506),count_non_units_r2l(lines ~514–523)Rationale: The original
do { ... } while (i-- > 0)pattern relies on unsigned integer underflow semantics. Wheni == 0, the post-decrement evaluates0 > 0 = falsebefore wrapping toUINT_MAX, so the loop terminates correctly. This is safe, but subtle: readers must know that unsigned wrapping is intentional and that the> 0check happens before the wrap. The idiomatic replacementfor (unsigned i = offset + 1; i-- > 0;)has identical semantics but makes the intent explicit — it is the established C++ idiom for safe reverse-iteration over unsigned indices. Same logic, more readable.ZIPT Reference: ZIPT's
NielsenNode.csright-to-left passes (SimplifyRight(),ReduceRight()) iterate fromCount - 1downwards using standardfor (int i = count - 1; i >= 0; i--)(C# uses signed indices, so underflow is not an issue). The Z3 port useddo...whileto replicate this semantic in unsigned C++; thefor (i = n+1; i-- > 0;)form is the standard unsigned equivalent.Improvement 3: Count steps inside
collect_leavesto respect the bisimulation step budgetFile:
src/ast/rewriter/seq_regex_bisim.cppFunctions:
collect_leaves(lines ~98–122),are_equivalent_core(lines ~224–242)Rationale:
are_equivalent_coremaintains a step counter (m_steps) bounded bym_step_bound(50 000) to prevent non-termination on pathological inputs. Previously,m_stepswas incremented once per worklist item — at the top of the outerwhile (!m_worklist.empty())loop. However,collect_leavestraverses the ITE/union tree of the Brzozowski derivative, which can be exponentially larger than the worklist. A derivative tree with depth d produces up to 2^d leaves, all processed bycollect_leaveswith no step accounting. This meant that the step budget only bounded the number of worklist iterations, not the total work done.The fix moves
++m_steps > m_step_boundinsidecollect_leaves(one increment per node visited in the derivative tree), and changes the outer-loop check tom_steps > m_step_bound(no additional increment). The outer check now acts as a fast-exit guard at the start of each iteration whencollect_leaveshas already exhausted the budget;collect_leavesitself returnsfalseon step exhaustion, causingare_equivalent_coreto returnl_undef.ZIPT Reference: ZIPT's
NielsenGraph.csuses aDepthBoundmechanism withItDeepeningIncthat limits work per expansion step rather than per top-level node. Thecollect_leavestraversal is the closest analog to ZIPT's per-expansion work. Bounding it per-node is consistent with ZIPT's philosophy of bounding total expansion work.Git Diff
The following diff can be applied with
git apply:To apply:
Testing
After applying this diff, build and test with:
The changes are pure refactors (improvements 1 and 2) and a conservative tightening of an existing bound (improvement 3); no behavioral changes are expected on any existing test cases.
Generated by ZIPT Code Reviewer agent — comparing Z3 implementation with CEisenhofer/ZIPT@parikh