Skip to content

[zipt-review] ZIPT Code Review: seq_eq_solver and seq_regex_bisim improvements from reference implementation #9839

@github-actions

Description

@github-actions

ZIPT Code Review: Improvements from Reference Implementation

Date: 2025-07-22
Files Reviewed: src/ast/rewriter/seq_eq_solver.cpp, src/ast/rewriter/seq_regex_bisim.cpp, src/smt/seq_eq_solver.cpp, src/smt/theory_seq.cpp
ZIPT Reference: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT

Note: The originally targeted euf_snode.h / euf_sgraph.h / euf_seq_plugin.* files do not yet exist in this repository. The review was re-scoped to the existing sequence solver implementation.

Summary

Three concrete improvements were identified by comparing Z3's sequence-equation solver with the ZIPT reference (NielsenNode.cs, NielsenGraph.cs, StrEqBase.cs). Two are in seq_eq_solver.cpp (variable shadowing and a tricky unsigned reverse-iteration idiom), and one is in seq_regex_bisim.cpp (moving step-bound accounting inside the leaf-traversal loop so deep derivative trees are properly bounded).

Improvements Applied

Improvement 1: Fix variable shadowing of eq_ptr& r output parameter in reduce_itos3 and reduce_ubv2s2

File: src/ast/rewriter/seq_eq_solver.cpp
Functions: reduce_itos3 (lines ~162–185), reduce_ubv2s2 (lines ~269–300)

Rationale: Both functions have an eq_ptr& r output parameter (used to pass back a derived equation). Inside each function, a range-for loop introduced for (expr* r : es), shadowing the output parameter. While the parameter happens not to be written inside those loops today, the shadowing creates a latent maintenance hazard: any future code that tries to set the output r inside the loop would silently write to the local copy instead. Renamed the loop variable to elem in all four loops (two functions × two loops each).

ZIPT Reference: ZIPT's StrEqBase.cs Simplify() and GetConstraints() use consistent naming (elem, ch, c) for loop variables to avoid conflicts with outer-scope names.

Improvement 2: Replace do...while(i-- > 0) with idiomatic for (unsigned i = n; i-- > 0;) in reverse-iteration helpers

File: src/ast/rewriter/seq_eq_solver.cpp
Functions: count_units_r2l (lines ~497–506), count_non_units_r2l (lines ~514–523)

Rationale: The original do { ... } while (i-- > 0) pattern relies on unsigned integer underflow semantics. When i == 0, the post-decrement evaluates 0 > 0 = false before wrapping to UINT_MAX, so the loop terminates correctly. This is safe, but subtle: readers must know that unsigned wrapping is intentional and that the > 0 check happens before the wrap. The idiomatic replacement for (unsigned i = offset + 1; i-- > 0;) has identical semantics but makes the intent explicit — it is the established C++ idiom for safe reverse-iteration over unsigned indices. Same logic, more readable.

ZIPT Reference: ZIPT's NielsenNode.cs right-to-left passes (SimplifyRight(), ReduceRight()) iterate from Count - 1 downwards using standard for (int i = count - 1; i >= 0; i--) (C# uses signed indices, so underflow is not an issue). The Z3 port used do...while to replicate this semantic in unsigned C++; the for (i = n+1; i-- > 0;) form is the standard unsigned equivalent.

Improvement 3: Count steps inside collect_leaves to respect the bisimulation step budget

File: src/ast/rewriter/seq_regex_bisim.cpp
Functions: collect_leaves (lines ~98–122), are_equivalent_core (lines ~224–242)

Rationale: are_equivalent_core maintains a step counter (m_steps) bounded by m_step_bound (50 000) to prevent non-termination on pathological inputs. Previously, m_steps was incremented once per worklist item — at the top of the outer while (!m_worklist.empty()) loop. However, collect_leaves traverses the ITE/union tree of the Brzozowski derivative, which can be exponentially larger than the worklist. A derivative tree with depth d produces up to 2^d leaves, all processed by collect_leaves with no step accounting. This meant that the step budget only bounded the number of worklist iterations, not the total work done.

The fix moves ++m_steps > m_step_bound inside collect_leaves (one increment per node visited in the derivative tree), and changes the outer-loop check to m_steps > m_step_bound (no additional increment). The outer check now acts as a fast-exit guard at the start of each iteration when collect_leaves has already exhausted the budget; collect_leaves itself returns false on step exhaustion, causing are_equivalent_core to return l_undef.

ZIPT Reference: ZIPT's NielsenGraph.cs uses a DepthBound mechanism with ItDeepeningInc that limits work per expansion step rather than per top-level node. The collect_leaves traversal is the closest analog to ZIPT's per-expansion work. Bounding it per-node is consistent with ZIPT's philosophy of bounding total expansion work.

Git Diff

The following diff can be applied with git apply:

diff --git a/src/ast/rewriter/seq_eq_solver.cpp b/src/ast/rewriter/seq_eq_solver.cpp
index e1ffae7..9eb2e7e 100644
--- a/src/ast/rewriter/seq_eq_solver.cpp
+++ b/src/ast/rewriter/seq_eq_solver.cpp
@@ -162,8 +162,8 @@ namespace seq {
             return true;
         }
         expr* u = nullptr;
-        for (expr* r : es) {
-            if (seq.str.is_unit(r, u)) {
+        for (expr* elem : es) {
+            if (seq.str.is_unit(elem, u)) {
                 expr_ref is_digit = m_ax.is_digit(u);
                 if (!m.is_true(ctx.expr2rep(is_digit)))
                     add_consequence(is_digit);
@@ -173,8 +173,8 @@ namespace seq {
             return false;
 
         expr_ref num(m);
-        for (expr* r : es) {
-            VERIFY(seq.str.is_unit(r, u));
+        for (expr* elem : es) {
+            VERIFY(seq.str.is_unit(elem, u));
             expr_ref digit = m_ax.sk().mk_digit2int(u);
             if (!num)
                 num = digit;
@@ -269,8 +269,8 @@ namespace seq {
             return true;
         }
         expr* u = nullptr;
-        for (expr* r : es) {
-            if (seq.str.is_unit(r, u)) {
+        for (expr* elem : es) {
+            if (seq.str.is_unit(elem, u)) {
                 expr_ref is_digit = m_ax.is_digit(u);
                 if (!m.is_true(ctx.expr2rep(is_digit)))
                     add_consequence(is_digit);
@@ -288,8 +288,8 @@ namespace seq {
             return true;
         }
 
-        for (expr* r : es) {
-            VERIFY(seq.str.is_unit(r, u));
+        for (expr* elem : es) {
+            VERIFY(seq.str.is_unit(elem, u));
             expr_ref digit = m_ax.sk().mk_digit2bv(u, bv_sort);
             if (!num)
                 num = digit;
@@ -496,12 +496,12 @@ namespace seq {
 
     unsigned eq_solver::count_units_r2l(expr_ref_vector const& es, unsigned offset) const {
         SASSERT(offset < es.size());
-        unsigned i = offset, count = 0;
-        do {
+        unsigned count = 0;
+        for (unsigned i = offset + 1; i-- > 0;) {
             if (!seq.str.is_unit(es[i]))
                 break;
             ++count;
-        } while (i-- > 0);
+        }
         return count;
     }
 
@@ -513,12 +513,12 @@ namespace seq {
 
     unsigned eq_solver::count_non_units_r2l(expr_ref_vector const& es, unsigned offset) const {
         SASSERT(offset < es.size());
-        unsigned i = offset, count = 0;
-        do {
+        unsigned count = 0;
+        for (unsigned i = offset + 1; i-- > 0;) {
             if (seq.str.is_unit(es[i]))
                 break;
             ++count;
-        } while (i-- > 0);
+        }
         return count;
     }
 
diff --git a/src/ast/rewriter/seq_regex_bisim.cpp b/src/ast/rewriter/seq_regex_bisim.cpp
index 5e84901..8a100f7 100644
--- a/src/ast/rewriter/seq_regex_bisim.cpp
+++ b/src/ast/rewriter/seq_regex_bisim.cpp
@@ -91,7 +91,8 @@ namespace seq {
        (re.empty) leaves are dropped.
 
        Returns false if we encountered an unexpected node (e.g. a free
-       variable creeping in) — in that case the caller should bail out.
+       variable creeping in) or the step budget was exhausted — in that
+       case the caller should bail out with l_undef.
     */
     bool regex_bisim::collect_leaves(expr* der, expr_ref_vector& leaves) {
         ptr_vector<expr> work;
@@ -99,6 +100,8 @@ namespace seq {
         work.push_back(der);
         seen.insert(der);
         while (!work.empty()) {
+            if (++m_steps > m_step_bound)
+                return false;
             expr* e = work.back();
             work.pop_back();
             expr* c = nullptr, * t = nullptr, * f = nullptr;
@@ -221,7 +224,7 @@ namespace seq {
         m_worklist.push_back(r0);
 
         while (!m_worklist.empty()) {
-            if (++m_steps > m_step_bound)
+            if (m_steps > m_step_bound)
                 return l_undef;
 
             expr_ref r(m_worklist.back(), m);

To apply:

git apply - << 'EOF'
[paste diff above]
EOF

Testing

After applying this diff, build and test with:

mkdir -p build && cd build
cmake ..
make -j$(nproc)
make test-z3
./test-z3 /a

The changes are pure refactors (improvements 1 and 2) and a conservative tightening of an existing bound (improvement 3); no behavioral changes are expected on any existing test cases.


Generated by ZIPT Code Reviewer agent — comparing Z3 implementation with CEisenhofer/ZIPT@parikh

Generated by ZIPT Code Reviewer · 1.3K AIC · ⌖ 38.4 AIC · ⊞ 28.8K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions