[zipt-review] ZIPT Code Review: seq_eq_solver and seq_regex_bisim improvements from reference implementation

## ZIPT Code Review: Improvements from Reference Implementation

**Date**: 2025-07-22
**Files Reviewed**: `src/ast/rewriter/seq_eq_solver.cpp`, `src/ast/rewriter/seq_regex_bisim.cpp`, `src/smt/seq_eq_solver.cpp`, `src/smt/theory_seq.cpp`
**ZIPT Reference**: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT

> **Note**: The originally targeted `euf_snode.h / euf_sgraph.h / euf_seq_plugin.*` files do not yet exist in this repository. The review was re-scoped to the existing sequence solver implementation.

### Summary

Three concrete improvements were identified by comparing Z3's sequence-equation solver with the ZIPT reference (`NielsenNode.cs`, `NielsenGraph.cs`, `StrEqBase.cs`). Two are in `seq_eq_solver.cpp` (variable shadowing and a tricky unsigned reverse-iteration idiom), and one is in `seq_regex_bisim.cpp` (moving step-bound accounting inside the leaf-traversal loop so deep derivative trees are properly bounded).

### Improvements Applied

#### Improvement 1: Fix variable shadowing of `eq_ptr& r` output parameter in `reduce_itos3` and `reduce_ubv2s2`

**File**: `src/ast/rewriter/seq_eq_solver.cpp`
**Functions**: `reduce_itos3` (lines ~162–185), `reduce_ubv2s2` (lines ~269–300)

**Rationale**: Both functions have an `eq_ptr& r` output parameter (used to pass back a derived equation). Inside each function, a range-`for` loop introduced `for (expr* r : es)`, shadowing the output parameter. While the parameter happens not to be written inside those loops today, the shadowing creates a latent maintenance hazard: any future code that tries to set the output `r` inside the loop would silently write to the local copy instead. Renamed the loop variable to `elem` in all four loops (two functions × two loops each).

**ZIPT Reference**: ZIPT's `StrEqBase.cs` `Simplify()` and `GetConstraints()` use consistent naming (`elem`, `ch`, `c`) for loop variables to avoid conflicts with outer-scope names.

#### Improvement 2: Replace `do...while(i-- > 0)` with idiomatic `for (unsigned i = n; i-- > 0;)` in reverse-iteration helpers

**File**: `src/ast/rewriter/seq_eq_solver.cpp`
**Functions**: `count_units_r2l` (lines ~497–506), `count_non_units_r2l` (lines ~514–523)

**Rationale**: The original `do { ... } while (i-- > 0)` pattern relies on unsigned integer underflow semantics. When `i == 0`, the post-decrement evaluates `0 > 0 = false` before wrapping to `UINT_MAX`, so the loop terminates correctly. This is safe, but subtle: readers must know that unsigned wrapping is intentional and that the `> 0` check happens before the wrap. The idiomatic replacement `for (unsigned i = offset + 1; i-- > 0;)` has identical semantics but makes the intent explicit — it is the established C++ idiom for safe reverse-iteration over unsigned indices. Same logic, more readable.

**ZIPT Reference**: ZIPT's `NielsenNode.cs` right-to-left passes (`SimplifyRight()`, `ReduceRight()`) iterate from `Count - 1` downwards using standard `for (int i = count - 1; i >= 0; i--)` (C# uses signed indices, so underflow is not an issue). The Z3 port used `do...while` to replicate this semantic in unsigned C++; the `for (i = n+1; i-- > 0;)` form is the standard unsigned equivalent.

#### Improvement 3: Count steps inside `collect_leaves` to respect the bisimulation step budget

**File**: `src/ast/rewriter/seq_regex_bisim.cpp`
**Functions**: `collect_leaves` (lines ~98–122), `are_equivalent_core` (lines ~224–242)

**Rationale**: `are_equivalent_core` maintains a step counter (`m_steps`) bounded by `m_step_bound` (50 000) to prevent non-termination on pathological inputs. Previously, `m_steps` was incremented once per *worklist item* — at the top of the outer `while (!m_worklist.empty())` loop. However, `collect_leaves` traverses the ITE/union tree of the Brzozowski derivative, which can be exponentially larger than the worklist. A derivative tree with depth *d* produces up to 2^d leaves, all processed by `collect_leaves` with no step accounting. This meant that the step budget only bounded the *number of worklist iterations*, not the total work done.

The fix moves `++m_steps > m_step_bound` inside `collect_leaves` (one increment per node visited in the derivative tree), and changes the outer-loop check to `m_steps > m_step_bound` (no additional increment). The outer check now acts as a fast-exit guard at the start of each iteration when `collect_leaves` has already exhausted the budget; `collect_leaves` itself returns `false` on step exhaustion, causing `are_equivalent_core` to return `l_undef`.

**ZIPT Reference**: ZIPT's `NielsenGraph.cs` uses a `DepthBound` mechanism with `ItDeepeningInc` that limits work per expansion step rather than per top-level node. The `collect_leaves` traversal is the closest analog to ZIPT's per-expansion work. Bounding it per-node is consistent with ZIPT's philosophy of bounding total expansion work.

### Git Diff

The following diff can be applied with `git apply`:

```diff
diff --git a/src/ast/rewriter/seq_eq_solver.cpp b/src/ast/rewriter/seq_eq_solver.cpp
index e1ffae7..9eb2e7e 100644
--- a/src/ast/rewriter/seq_eq_solver.cpp
+++ b/src/ast/rewriter/seq_eq_solver.cpp
@@ -162,8 +162,8 @@ namespace seq {
             return true;
         }
         expr* u = nullptr;
-        for (expr* r : es) {
-            if (seq.str.is_unit(r, u)) {
+        for (expr* elem : es) {
+            if (seq.str.is_unit(elem, u)) {
                 expr_ref is_digit = m_ax.is_digit(u);
                 if (!m.is_true(ctx.expr2rep(is_digit)))
                     add_consequence(is_digit);
@@ -173,8 +173,8 @@ namespace seq {
             return false;
 
         expr_ref num(m);
-        for (expr* r : es) {
-            VERIFY(seq.str.is_unit(r, u));
+        for (expr* elem : es) {
+            VERIFY(seq.str.is_unit(elem, u));
             expr_ref digit = m_ax.sk().mk_digit2int(u);
             if (!num)
                 num = digit;
@@ -269,8 +269,8 @@ namespace seq {
             return true;
         }
         expr* u = nullptr;
-        for (expr* r : es) {
-            if (seq.str.is_unit(r, u)) {
+        for (expr* elem : es) {
+            if (seq.str.is_unit(elem, u)) {
                 expr_ref is_digit = m_ax.is_digit(u);
                 if (!m.is_true(ctx.expr2rep(is_digit)))
                     add_consequence(is_digit);
@@ -288,8 +288,8 @@ namespace seq {
             return true;
         }
 
-        for (expr* r : es) {
-            VERIFY(seq.str.is_unit(r, u));
+        for (expr* elem : es) {
+            VERIFY(seq.str.is_unit(elem, u));
             expr_ref digit = m_ax.sk().mk_digit2bv(u, bv_sort);
             if (!num)
                 num = digit;
@@ -496,12 +496,12 @@ namespace seq {
 
     unsigned eq_solver::count_units_r2l(expr_ref_vector const& es, unsigned offset) const {
         SASSERT(offset < es.size());
-        unsigned i = offset, count = 0;
-        do {
+        unsigned count = 0;
+        for (unsigned i = offset + 1; i-- > 0;) {
             if (!seq.str.is_unit(es[i]))
                 break;
             ++count;
-        } while (i-- > 0);
+        }
         return count;
     }
 
@@ -513,12 +513,12 @@ namespace seq {
 
     unsigned eq_solver::count_non_units_r2l(expr_ref_vector const& es, unsigned offset) const {
         SASSERT(offset < es.size());
-        unsigned i = offset, count = 0;
-        do {
+        unsigned count = 0;
+        for (unsigned i = offset + 1; i-- > 0;) {
             if (seq.str.is_unit(es[i]))
                 break;
             ++count;
-        } while (i-- > 0);
+        }
         return count;
     }
 
diff --git a/src/ast/rewriter/seq_regex_bisim.cpp b/src/ast/rewriter/seq_regex_bisim.cpp
index 5e84901..8a100f7 100644
--- a/src/ast/rewriter/seq_regex_bisim.cpp
+++ b/src/ast/rewriter/seq_regex_bisim.cpp
@@ -91,7 +91,8 @@ namespace seq {
        (re.empty) leaves are dropped.
 
        Returns false if we encountered an unexpected node (e.g. a free
-       variable creeping in) — in that case the caller should bail out.
+       variable creeping in) or the step budget was exhausted — in that
+       case the caller should bail out with l_undef.
     */
     bool regex_bisim::collect_leaves(expr* der, expr_ref_vector& leaves) {
         ptr_vector<expr> work;
@@ -99,6 +100,8 @@ namespace seq {
         work.push_back(der);
         seen.insert(der);
         while (!work.empty()) {
+            if (++m_steps > m_step_bound)
+                return false;
             expr* e = work.back();
             work.pop_back();
             expr* c = nullptr, * t = nullptr, * f = nullptr;
@@ -221,7 +224,7 @@ namespace seq {
         m_worklist.push_back(r0);
 
         while (!m_worklist.empty()) {
-            if (++m_steps > m_step_bound)
+            if (m_steps > m_step_bound)
                 return l_undef;
 
             expr_ref r(m_worklist.back(), m);
```

To apply:
```bash
git apply - << 'EOF'
[paste diff above]
EOF
```

### Testing

After applying this diff, build and test with:
```bash
mkdir -p build && cd build
cmake ..
make -j$(nproc)
make test-z3
./test-z3 /a
```

The changes are pure refactors (improvements 1 and 2) and a conservative tightening of an existing bound (improvement 3); no behavioral changes are expected on any existing test cases.

---
*Generated by ZIPT Code Reviewer agent — comparing Z3 implementation with [CEisenhofer/ZIPT@parikh](https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT)*




> Generated by [ZIPT Code Reviewer](https://github.com/Z3Prover/z3/actions/runs/27402031717) · 1.3K AIC · ⌖ 38.4 AIC · ⊞ 28.8K · [◷](https://github.com/search?q=repo%3AZ3Prover%2Fz3+is%3Aissue+%22gh-aw-workflow-call-id%3A+Z3Prover%2Fz3%2Fzipt-code-reviewer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zipt-review] ZIPT Code Review: seq_eq_solver and seq_regex_bisim improvements from reference implementation #9839

ZIPT Code Review: Improvements from Reference Implementation

Summary

Improvements Applied

Improvement 1: Fix variable shadowing of `eq_ptr& r` output parameter in `reduce_itos3` and `reduce_ubv2s2`

Improvement 2: Replace `do...while(i-- > 0)` with idiomatic `for (unsigned i = n; i-- > 0;)` in reverse-iteration helpers

Improvement 3: Count steps inside `collect_leaves` to respect the bisimulation step budget

Git Diff

Testing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[zipt-review] ZIPT Code Review: seq_eq_solver and seq_regex_bisim improvements from reference implementation #9839

Description

ZIPT Code Review: Improvements from Reference Implementation

Summary

Improvements Applied

Improvement 1: Fix variable shadowing of eq_ptr& r output parameter in reduce_itos3 and reduce_ubv2s2

Improvement 2: Replace do...while(i-- > 0) with idiomatic for (unsigned i = n; i-- > 0;) in reverse-iteration helpers

Improvement 3: Count steps inside collect_leaves to respect the bisimulation step budget

Git Diff

Testing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Improvement 1: Fix variable shadowing of `eq_ptr& r` output parameter in `reduce_itos3` and `reduce_ubv2s2`

Improvement 2: Replace `do...while(i-- > 0)` with idiomatic `for (unsigned i = n; i-- > 0;)` in reverse-iteration helpers

Improvement 3: Count steps inside `collect_leaves` to respect the bisimulation step budget