docs(reborn): subagent thread harness — foundational design + PR0/PR1 plan#5176
docs(reborn): subagent thread harness — foundational design + PR0/PR1 plan#5176henrypark133 wants to merge 1 commit into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds two new documentation files: a foundational design spec ( ChangesDurable subagent thread harness
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/plans/2026-06-10-triggered-run-delivery-owner-scope.md`:
- Around line 398-404: The validation steps in this section don’t match the
repo’s required pre-review gate. Update the commands in this plan to mirror the
contribution checklist by using the strict fmt/clippy variants, adding the
required build step, and keeping the full test run; reference the existing “Step
2: Full test suite” block and the surrounding quality-gate commands so they
align with the mandated pre-PR checks.
- Around line 313-323: Remove the contradictory fallback claim from the commit
message text in the triggered-run delivery owner scope plan and keep the
description aligned with the stated scope. Update the wording around the
trigger/thread ownership change so it only says new first-fire bindings persist
the creator as owner and existing Direct-route bindings are not backfilled,
without asserting that any delivery-side fallback covers them. Use the commit
message block in the plan document as the target for the edit.
In `@docs/plans/2026-06-18-triggers-on-rootfilesystem.md`:
- Around line 188-194: The thread identity derivation is inconsistent: one path
includes creator_user_id while other paths already rely on
fire.identity.route_thread_id() using tenant_id, trigger_id, and fire_slot only.
Update the canonical thread_id hashing rules so every use of ensure_thread and
TriggerFireIdentity.route_thread_id() shares the same input set, and remove any
alternate derivation in the trusted-trigger fire path. Make sure all
recovery/idempotency-related lookups use that single canonical hash definition.
In `@docs/plans/2026-06-22-reborn-runtime-hang-investigation.md`:
- Around line 5-7: The incident note in the reborn-runtime-hang-investigation
document exposes a full user UUID, so redact or pseudonymize that identifier
while preserving the troubleshooting context. Update the text in the affected
incident summary to use a masked user reference instead of the raw UUID, keeping
the same details about the tenant and outage but removing the direct user
identifier.
In `@docs/reborn/2026-06-16-memory-how-it-works.md`:
- Around line 13-14: Update the memory search docs so they match the runtime
behavior in dispatch_search: the default Reborn build disables vector search
explicitly via with_vector(false), not because embeddings are missing, and the
defaults are limit 5 with pre_fusion_limit set to max(limit, 20). Adjust the
affected description in memory-how-it-works.md to reflect these actual defaults
and the fact that only FTS runs by default.
In `@docs/superpowers/plans/2026-06-16-persistent-user-context-profile.md`:
- Around line 924-933: The integration test for profile rendering is asserting
the old compact location format, but the runtime now uses sanitized
untrusted-user-data framing for location. Update the assertions in
profile_set_then_runtime_context_renders_local_time to match the current
rendered contract by checking the explicit untrusted-user-data output around
location while still verifying the Asia/Tokyo local time and locale fields,
using the same capability-dispatch and loop-driver host flow to locate the
behavior.
- Around line 758-764: The existing profile merge logic in the retry loop is
swallowing corrupt or non-JSON stored documents by using
serde_json::from_slice(...).unwrap_or_default(), which can overwrite bad data
with an empty object. Update the merge path in this block to fail closed: when
deserialization of current bytes fails, return operation_error() instead of
defaulting, and keep the normal flow only for valid JSON or missing documents.
Use the current/read_document handling and the existing content_bytes_sha256
helper as the anchor points for locating the fix.
- Around line 261-263: The plan currently mixes two profile contracts: one part
says to use only UserProfileContext.timezone with no ResolvedUserProfile, while
later steps still reference ResolvedUserProfile and user_timezone. Normalize the
entire document to a single model by updating all affected steps/snippets to use
UserProfileContext directly, removing any remaining ResolvedUserProfile
mentions, and keeping the LoopRunContext-based trait shape consistent with the
intended load_identity_candidates pattern.
In `@docs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.md`:
- Around line 278-279: The boot-recovery design is inconsistent between the
recovery driver and the spec, so reconcile it to a single canonical flow. Update
the recovery plan around list_parents_with_unclosed_edges and the boot-recovery
section to match the spec’s active-parent-run enumeration approach, or
explicitly revise the spec if the edge-store scan is intended to be the source
of truth. Make sure the description of TurnStateStore and any recovery
orchestration steps use one consistent driver so there is no split
implementation path.
In `@docs/superpowers/specs/2026-06-16-persistent-user-context-profile-design.md`:
- Around line 82-83: The profile-rendering spec is treating free-text location
as safe inline context, which weakens the prompt-injection boundary. Update the
user profile examples and guidance to frame location as untrusted user-provided
data, with explicit “not instructions” language and quoted rendering instead of
`location=...` style inline context. Keep the timezone behavior unchanged in the
existing time-rendering branch, and make sure the spec wording aligns with the
tested contract for the profile rendering logic.
In `@docs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md`:
- Around line 113-114: Update the Subagent Thread Harness design spec so the
awaited edge uses the gate-ref-compatible hyphenated format, not the
colon-separated placeholder. In the section describing the real GateRef, replace
the `gate:subagent-await:<child_run_id>` example with the same naming convention
used by the implementation plan, and keep the wording aligned with the resume
precondition behavior for blocked parents.
In `@scripts/loadtest/parse_concurrency.py`:
- Around line 174-177: The throughput calculation in the concurrency parsing
logic can divide by a zero duration when `max(finished_times) -
min(started_times)` is 0, so update the branch in `parse_concurrency.py` to
mirror the existing duration guard used later in the file. In the throughput
section that uses `n_finished`, `n_started`, `finished_times`, and
`started_times`, compute throughput only when the duration is greater than 0;
otherwise fall back to 0.0 to avoid `ZeroDivisionError`.
In `@scripts/loadtest/stub_llm_server.py`:
- Line 93: Bind the stub LLM server to loopback instead of all interfaces by
updating the ThreadingHTTPServer setup in stub_llm_server.py so the server
listens on 127.0.0.1 for args.port; keep the change localized to the server
initialization path and ensure any related startup or logging messages in
stub_llm_server reflect the loopback-only binding.
In `@scripts/loadtest/sweep.sh`:
- Around line 15-16: The prerequisites comment in sweep.sh is stale because the
load test Python tools are stdlib-only, so remove the mention of httpx and pip
install -r requirements.txt from the header. Update the prerequisite text near
the sweep.sh comment to list only the actual required tools (python3, cargo,
curl) and keep it consistent with the headers in loadgen.py and
stub_llm_server.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: e6626008-9618-4ed8-b4a3-6e32956864fc
📒 Files selected for processing (18)
docs/plans/2026-06-10-triggered-run-delivery-owner-scope.mddocs/plans/2026-06-14-trusted-trigger-classification-as-type-4851.mddocs/plans/2026-06-15-google-oauth-account-fork-fix.mddocs/plans/2026-06-18-triggers-on-rootfilesystem.mddocs/plans/2026-06-22-reborn-runtime-hang-investigation.mddocs/plans/2026-06-22-turn-state-lock-wedge-fix.mddocs/plans/2026-06-23-github-pat-auth-bugs-1-2.mddocs/reborn/2026-06-08-subagent-durability-spec.mddocs/reborn/2026-06-16-memory-how-it-works.mddocs/superpowers/plans/2026-06-16-persistent-user-context-profile.mddocs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.mddocs/superpowers/specs/2026-06-16-persistent-user-context-profile-design.mddocs/superpowers/specs/2026-06-23-subagent-thread-harness-design.mdscripts/loadtest/loadgen.pyscripts/loadtest/parse_concurrency.pyscripts/loadtest/run_serve.shscripts/loadtest/stub_llm_server.pyscripts/loadtest/sweep.sh
| git commit -m "fix(conversations): thread trigger creator into trusted binding resolution | ||
|
|
||
| Trigger fires previously resolved conversation bindings with | ||
| trusted_owner_user_id=None, so triggered-run TurnScopes carried | ||
| ActorFallback ownership. Slack delivery then reconstructed a thread | ||
| scope with owner None while the thread was stored with Some(actor), | ||
| silently skipping the outbound push. New first-fire bindings now | ||
| persist the creator as owner. Existing Direct-route bindings are | ||
| deliberately not backfilled (scope-equality compat for in-flight | ||
| runs); the delivery-side fallback covers them." | ||
| ``` |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Remove the fallback claim from the commit message.
Line 322 says “the delivery-side fallback covers them,” but Line 17 explicitly declares that fallback out of scope. This contradiction will mislead implementation/review.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/2026-06-10-triggered-run-delivery-owner-scope.md` around lines 313
- 323, Remove the contradictory fallback claim from the commit message text in
the triggered-run delivery owner scope plan and keep the description aligned
with the stated scope. Update the wording around the trigger/thread ownership
change so it only says new first-fire bindings persist the creator as owner and
existing Direct-route bindings are not backfilled, without asserting that any
delivery-side fallback covers them. Use the commit message block in the plan
document as the target for the edit.
| Run: `cargo fmt && cargo clippy --all --benches --tests --examples --all-features 2>&1 | tail -5 && cargo test -p ironclaw_conversations -p ironclaw_reborn_composition -p ironclaw_turns -p ironclaw_triggers --features slack-v2-host-beta` | ||
| Expected: fmt clean, clippy zero warnings, all tests pass (`ironclaw_turns` included because the fix consumes `TurnScope::new_with_owner`) | ||
|
|
||
| - [ ] **Step 2: Full test suite** | ||
|
|
||
| Run: `cargo test` | ||
| Expected: PASS (catches any workspace-wide fallout from the `TrustedInboundTurnRequest` signature change — the type is `pub(crate)`, so fallout should be zero outside `ironclaw_conversations`) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Align the quality gate commands with repo-required pre-PR checks.
This block skips the required strict flags and cargo build listed in repo contribution rules; it should mirror the mandated pre-review gate command set.
As per coding guidelines: “Run the local validation checks required before requesting a review… cargo fmt --all -- --check, cargo clippy ... -- -D warnings, cargo build, cargo test.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/2026-06-10-triggered-run-delivery-owner-scope.md` around lines 398
- 404, The validation steps in this section don’t match the repo’s required
pre-review gate. Update the commands in this plan to mirror the contribution
checklist by using the strict fmt/clippy variants, adding the required build
step, and keeping the full test run; reference the existing “Step 2: Full test
suite” block and the surrounding quality-gate commands so they align with the
mandated pre-PR checks.
Source: Coding guidelines
| The trusted-trigger fire path supplies a deterministic `thread_id` derived from | ||
| `TriggerFireIdentity` to `ensure_thread` | ||
| (`crates/ironclaw_threads/src/contract.rs:300`, | ||
| `filesystem_service.rs:697-724`, `CasExpectation::Absent`), instead of the binding | ||
| layer minting a random UUID. The canonical `thread_id` for a fire is | ||
| `H(tenant_id, creator_user_id, trigger_id, fire_slot)`. | ||
|
|
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Use one canonical hash input set for thread identity.
Line 193 adds creator_user_id to thread_id derivation, but Lines 84-89 and 385-387 rely on fire.identity.route_thread_id() (tenant/trigger/fire_slot). Mixed derivation rules will break idempotent lookup/recovery.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/2026-06-18-triggers-on-rootfilesystem.md` around lines 188 - 194,
The thread identity derivation is inconsistent: one path includes
creator_user_id while other paths already rely on
fire.identity.route_thread_id() using tenant_id, trigger_id, and fire_slot only.
Update the canonical thread_id hashing rules so every use of ensure_thread and
TriggerFireIdentity.route_thread_id() shares the same input set, and remove any
alternate derivation in the trusted-trigger fire path. Make sure all
recovery/idempotency-related lookups use that single canonical hash definition.
| A local-dev `ironclaw-reborn serve` instance (tenant `reborn-cli`, single user | ||
| `e3aa23b0-e23e-4e1f-9b14-b8954e543200`) became **completely unresponsive to all | ||
| users for ~30 minutes**. The user has no triggers of their own, but other users' |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Redact direct user identifiers in incident docs.
Line 6 publishes a full user UUID. Keep incident utility, but mask or pseudonymize user identifiers to avoid persisting sensitive identifiers in repo history.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/2026-06-22-reborn-runtime-hang-investigation.md` around lines 5 -
7, The incident note in the reborn-runtime-hang-investigation document exposes a
full user UUID, so redact or pseudonymize that identifier while preserving the
troubleshooting context. Update the text in the affected incident summary to use
a masked user reference instead of the raw UUID, keeping the same details about
the tenant and outage but removing the direct user identifier.
| - Search is **hybrid FTS + vector with RRF fusion**, but in the default Reborn build **no embedding provider is wired**, so vector search is inactive and only FTS runs. | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Search behavior docs are out of sync with runtime implementation.
Line 13 and Lines 179-182 currently describe vector as disabled due to missing embeddings and list defaults (limit 20, pre_fusion_limit 50, both branches on). In crates/ironclaw_host_runtime/src/first_party_tools/memory.rs (dispatch_search), vector is explicitly disabled via .with_vector(false), default limit is 5, and pre_fusion_limit is max(limit, 20).
Suggested doc correction
-- Search is **hybrid FTS + vector with RRF fusion**, but in the default Reborn build **no embedding provider is wired**, so vector search is inactive and only FTS runs.
+- Search infrastructure supports **hybrid FTS + vector with RRF fusion**, but the current runtime path (`dispatch_search`) explicitly sets `.with_vector(false)`, so only FTS executes today.
...
-`MemorySearchResult` = `{ path, score, snippet, full_text_rank, vector_rank }`. Search defaults: `limit` 20 (capped per capability schema to ≤20), `pre_fusion_limit` 50, both branches on, RRF k=60.
+`MemorySearchResult` = `{ path, score, snippet, full_text_rank, vector_rank }`. In the current runtime capability path, default `limit` is 5 (clamped to 1..20), `pre_fusion_limit` is `max(limit, 20)`, and vector is disabled for this call path.
-**Caveat:** the default Reborn `build_backend` wiring passes **no embedding provider**, so the vector branch is inert; only FTS contributes. Hybrid search is built but runs single-modality until an embedding provider is wired at the host boundary.
+**Caveat:** hybrid search exists in the backend, but this capability path currently forces FTS-only by request construction (`with_vector(false)`).Also applies to: 179-182
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/reborn/2026-06-16-memory-how-it-works.md` around lines 13 - 14, Update
the memory search docs so they match the runtime behavior in dispatch_search:
the default Reborn build disables vector search explicitly via
with_vector(false), not because embeddings are missing, and the defaults are
limit 5 with pre_fusion_limit set to max(limit, 20). Adjust the affected
description in memory-how-it-works.md to reflect these actual defaults and the
fact that only FTS runs by default.
| **real** `GateRef` that names the edge (e.g. `gate:subagent-await:<child_run_id>`) | ||
| — not a synthetic placeholder, but a genuine reference to the awaited edge — so |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
GateRef format conflicts with the PR plan and can break resume preconditions.
Line 113 uses gate:subagent-await:<child_run_id>, but the implementation plan requires gate:subagent-await-<child_run_id> (hyphen suffix) for gate-ref compatibility. This mismatch can cause blocked-parent resume to miss its expected gate key.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md` around
lines 113 - 114, Update the Subagent Thread Harness design spec so the awaited
edge uses the gate-ref-compatible hyphenated format, not the colon-separated
placeholder. In the section describing the real GateRef, replace the
`gate:subagent-await:<child_run_id>` example with the same naming convention
used by the implementation plan, and keep the wording aligned with the resume
precondition behavior for blocked parents.
| **Path identity & access patterns.** The edge is keyed by | ||
| `(parent_run_id, child_run_id)`. Two access patterns, both cheap: | ||
| - **Live settle:** a child knows its own `parent_run_id` (lineage), so the edge | ||
| is a direct path lookup — no by-child index. | ||
| - **Parent drain:** the parent lists its own directory | ||
| `/await-edges/<parent_run_id>/` (single-level `list_dir`). | ||
|
|
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Await-edge directory path is inconsistent within this spec.
Line 219 defines /turns/subagent-await-edges/..., but Line 249 says parent drain lists /await-edges/<parent_run_id>/. Please normalize to one canonical path; otherwise store/list implementations will diverge.
| if n_finished > 1 and n_started > 0: | ||
| throughput = n_finished / (max(finished_times) - min(started_times)) | ||
| else: | ||
| throughput = 0.0 |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Guard against zero-duration division.
max(finished_times) - min(started_times) can be 0 (single instantaneous run, or all timestamps colliding at parser resolution), raising ZeroDivisionError. The bucket path at Line 185 already guards duration > 0; mirror it here.
Proposed fix
- if n_finished > 1 and n_started > 0:
- throughput = n_finished / (max(finished_times) - min(started_times))
- else:
- throughput = 0.0
+ span = max(finished_times) - min(started_times) if (n_finished > 1 and n_started > 0) else 0.0
+ throughput = n_finished / span if span > 0 else 0.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if n_finished > 1 and n_started > 0: | |
| throughput = n_finished / (max(finished_times) - min(started_times)) | |
| else: | |
| throughput = 0.0 | |
| span = max(finished_times) - min(started_times) if (n_finished > 1 and n_started > 0) else 0.0 | |
| throughput = n_finished / span if span > 0 else 0.0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/loadtest/parse_concurrency.py` around lines 174 - 177, The throughput
calculation in the concurrency parsing logic can divide by a zero duration when
`max(finished_times) - min(started_times)` is 0, so update the branch in
`parse_concurrency.py` to mirror the existing duration guard used later in the
file. In the throughput section that uses `n_finished`, `n_started`,
`finished_times`, and `started_times`, compute throughput only when the duration
is greater than 0; otherwise fall back to 0.0 to avoid `ZeroDivisionError`.
| latency_s = args.latency_ms / 1000.0 | ||
| handler = make_handler(latency_s) | ||
|
|
||
| server = ThreadingHTTPServer(("0.0.0.0", args.port), handler) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win
Bind the stub to loopback, not 0.0.0.0.
The harness already pins the real server to 127.0.0.1 (run_serve.sh Line 120) and the stub URL is http://127.0.0.1:9999/v1. Binding to all interfaces needlessly exposes the canned-response endpoint on the LAN with no auth.
Proposed fix
- server = ThreadingHTTPServer(("0.0.0.0", args.port), handler)
+ server = ThreadingHTTPServer(("127.0.0.1", args.port), handler)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| server = ThreadingHTTPServer(("0.0.0.0", args.port), handler) | |
| server = ThreadingHTTPServer(("127.0.0.1", args.port), handler) |
🧰 Tools
🪛 Ruff (0.15.18)
[error] 93-93: Possible binding to all interfaces
(S104)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/loadtest/stub_llm_server.py` at line 93, Bind the stub LLM server to
loopback instead of all interfaces by updating the ThreadingHTTPServer setup in
stub_llm_server.py so the server listens on 127.0.0.1 for args.port; keep the
change localized to the server initialization path and ensure any related
startup or logging messages in stub_llm_server reflect the loopback-only
binding.
| # Prerequisites: python3, httpx installed (pip install -r requirements.txt), | ||
| # cargo in PATH, curl in PATH. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Stale prerequisite: no httpx/requirements.txt needed.
Both Python tools state stdlib-only (loadgen.py Line 14, stub_llm_server.py header), so httpx installed (pip install -r requirements.txt) is wrong and sends users on a dead-end setup. Drop it.
Proposed fix
-# Prerequisites: python3, httpx installed (pip install -r requirements.txt),
-# cargo in PATH, curl in PATH.
+# Prerequisites: python3 (stdlib only), cargo in PATH, curl in PATH.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Prerequisites: python3, httpx installed (pip install -r requirements.txt), | |
| # cargo in PATH, curl in PATH. | |
| # Prerequisites: python3 (stdlib only), cargo in PATH, curl in PATH. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/loadtest/sweep.sh` around lines 15 - 16, The prerequisites comment in
sweep.sh is stale because the load test Python tools are stdlib-only, so remove
the mention of httpx and pip install -r requirements.txt from the header. Update
the prerequisite text near the sweep.sh comment to list only the actual required
tools (python3, cargo, curl) and keep it consistent with the headers in
loadgen.py and stub_llm_server.py.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0780accb7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ! curl -sf "http://127.0.0.1:9999/v1/models" >/dev/null 2>&1; then | ||
| echo "ERROR: stub LLM server did not start" >&2 | ||
| exit 1 |
There was a problem hiding this comment.
Fail when the stub port is already occupied
When port 9999 is already held by a previous stub or another OpenAI-compatible service that answers /v1/models, the newly spawned stub_llm_server.py exits with EADDRINUSE but this readiness curl succeeds against the old process. The sweep then records a dead STUB_PID and runs against whatever latency/model behavior the stale server has, corrupting the throughput results; check that the spawned PID is still alive and/or fail on an occupied port before accepting readiness.
Useful? React with 👍 / 👎.
|
🚅 Deployed to the ironclaw-pr-5176 environment in ironclaw-ci-preview
|
Foundational design + bite-sized TDD plan for the Reborn subagent thread harness: subagents become first-class, addressable, resumable threads (parent and human can attach), with durable delivery via a single ScopedFilesystem await-edge per parent-waits-on-child (no SQL gate stack). Design hardened over 3 adversarial review rounds; PR0+PR1 plan converged over 7. Removes the superseded 2026-06-08 subagent-durability sub-spec (replaced by this design). Other subagent docs (spawn mechanics, compaction, planner) describe live shipped code and are intentionally kept. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f0780ac to
613a307
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.md`:
- Around line 684-689: The await-edge abstraction is being defined in the
higher-level reborn crate while SubagentSpawnDeps lives in loop_support, which
creates a reverse dependency risk or cycle. Move AwaitEdgeWriter into a
lower/shared crate alongside SubagentSpawnDeps, then have
FilesystemAwaitEdgeStore in await_edge.rs implement that trait and keep the
optional await_edge_store field typed against the shared trait. Update the
SubagentSpawnDeps wiring and any finish_spawn call sites to use the relocated
trait without introducing a reborn dependency back into loop_support.
- Around line 690-691: The runtime path reference in the Task 1.6c note is
inconsistent with the rest of the plan and points to the wrong crate. Update the
production-site reference from ironclaw_reborn/src/runtime.rs to
ironclaw_reborn_composition/src/runtime.rs, and keep the surrounding
SubagentSpawnDeps guidance unchanged so contributors implement the wiring in the
correct runtime module.
In `@docs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md`:
- Around line 257-260: The boot recovery description conflicts with the PR1
await-edge store contract by relying on active parent runs instead of the
store-driven recovery entrypoint. Update the recovery spec around the boot
algorithm to use list_parents_with_unclosed_edges() as the source of truth, then
for each returned parent inspect its edge records and reconcile terminal or
abandoned children via get_run_state(child) and edge closure logic. Keep the
wording aligned with the existing await-edge store symbols so the boot flow is
clearly derived from the store contract rather than a separate active-run query.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: f075e771-bba5-4c30-8d34-24b0c63d4cc2
📒 Files selected for processing (3)
docs/reborn/2026-06-08-subagent-durability-spec.mddocs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.mddocs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md
| - Consumes: `FilesystemAwaitEdgeStore::write_open` + `AwaitEdge` (Tasks 1.1/1.2), `await_edge_gate_ref` (Task 1.1), `RebornRuntimeInput.subagent_v2_enabled` threaded into `SubagentSpawnDeps` (add a `subagent_v2_enabled: bool` field + optional `await_edge_store: Option<Arc<dyn AwaitEdgeWriter>>` where `AwaitEdgeWriter` is a tiny object-safe trait wrapping `write_open` — keeps `SubagentSpawnDeps` non-generic). | ||
| - Produces: when `subagent_v2_enabled`, `finish_spawn` writes an `open` await-edge (in addition to the existing `record_awaited_child`) and parks the parent in `BlockedDependentRun` with `BlockedReason::AwaitDependentRun { gate_ref: await_edge_gate_ref(child_run_id) }`. | ||
|
|
||
| - [ ] **Step 1: Read** `finish_spawn` (~847) and how the parent is currently blocked (`completion_observer.rs:3050` sets `BlockedDependentRun`). Note the exact block-construction call. | ||
| - [ ] **Step 2: Define `AwaitEdgeWriter`** (object-safe) in `await_edge.rs`: `#[async_trait] pub trait AwaitEdgeWriter: Send + Sync { async fn write_open(&self, scope: &TurnScope, parent: &TurnRunId, child: &TurnRunId, edge: &AwaitEdge) -> Result<(), AwaitEdgeError>; }` and impl it for `FilesystemAwaitEdgeStore<F>`. (This thin erasure trait exists solely to keep `SubagentSpawnDeps` non-generic — add a one-line comment saying so.) Add `subagent_v2_enabled: bool` and `await_edge_store: Option<Arc<dyn AwaitEdgeWriter>>` to `SubagentSpawnDeps`. | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | 🏗️ Heavy lift
Avoid a reverse crate dependency for AwaitEdgeWriter.
SubagentSpawnDeps is in crates/ironclaw_loop_support, but Line 688 defines AwaitEdgeWriter in crates/ironclaw_reborn/src/subagent/await_edge.rs and then threads it into SubagentSpawnDeps. That implies loop_support depends on reborn (or creates a cycle). Put the trait in loop_support (or another lower/shared crate) and implement it in reborn.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.md` around
lines 684 - 689, The await-edge abstraction is being defined in the higher-level
reborn crate while SubagentSpawnDeps lives in loop_support, which creates a
reverse dependency risk or cycle. Move AwaitEdgeWriter into a lower/shared crate
alongside SubagentSpawnDeps, then have FilesystemAwaitEdgeStore in await_edge.rs
implement that trait and keep the optional await_edge_store field typed against
the shared trait. Update the SubagentSpawnDeps wiring and any finish_spawn call
sites to use the relocated trait without introducing a reborn dependency back
into loop_support.
| **Architecture-rule compliance (B2):** `Option<Arc<dyn AwaitEdgeWriter>>` is a flag-gated optional dependency, which `.claude/rules/architecture.md` rule 2 flags. Annotate the field: `// arch-exempt: optional_arc — flag-gated v2 path, set iff subagent.v2_enabled; removed when v2 becomes default (plan 2026-06-23 cleanup PR)`. **Ripple:** `SubagentSpawnDeps` is a struct-literal with ~15 test construction sites + 1 production site (`ironclaw_reborn/src/runtime.rs:556`). Every site must add `subagent_v2_enabled: false, await_edge_store: None,`. Enumerate them first: `rg -ln "SubagentSpawnDeps" crates/` then grep each file for the struct-literal construction (rustfmt may break `SubagentSpawnDeps {` across lines, so don't rely on a brace-adjacent pattern); update each in this task (mechanical two-field additions: `subagent_v2_enabled: false, await_edge_store: None,`). The production site (`ironclaw_reborn/src/runtime.rs:556`) is wired for real in Task 1.6c. | ||
| - [ ] **Step 3: Write the failing test** — spawn with `subagent_v2_enabled=true` + a captured in-memory `AwaitEdgeWriter`; assert an `open` edge is written for the child. With `false`, assert none. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fix the runtime path inconsistency in the task instructions.
Line 690 points to crates/ironclaw_reborn/src/runtime.rs:556, but this plan’s file map and Task 1.6c both target crates/ironclaw_reborn_composition/src/runtime.rs. This mismatch can send implementation to the wrong crate.
Suggested doc fix
-- The production site (`ironclaw_reborn/src/runtime.rs:556`) is wired for real in Task 1.6c.
+- The production site (`ironclaw_reborn_composition/src/runtime.rs`) is wired for real in Task 1.6c.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Architecture-rule compliance (B2):** `Option<Arc<dyn AwaitEdgeWriter>>` is a flag-gated optional dependency, which `.claude/rules/architecture.md` rule 2 flags. Annotate the field: `// arch-exempt: optional_arc — flag-gated v2 path, set iff subagent.v2_enabled; removed when v2 becomes default (plan 2026-06-23 cleanup PR)`. **Ripple:** `SubagentSpawnDeps` is a struct-literal with ~15 test construction sites + 1 production site (`ironclaw_reborn/src/runtime.rs:556`). Every site must add `subagent_v2_enabled: false, await_edge_store: None,`. Enumerate them first: `rg -ln "SubagentSpawnDeps" crates/` then grep each file for the struct-literal construction (rustfmt may break `SubagentSpawnDeps {` across lines, so don't rely on a brace-adjacent pattern); update each in this task (mechanical two-field additions: `subagent_v2_enabled: false, await_edge_store: None,`). The production site (`ironclaw_reborn/src/runtime.rs:556`) is wired for real in Task 1.6c. | |
| - [ ] **Step 3: Write the failing test** — spawn with `subagent_v2_enabled=true` + a captured in-memory `AwaitEdgeWriter`; assert an `open` edge is written for the child. With `false`, assert none. | |
| **Architecture-rule compliance (B2):** `Option<Arc<dyn AwaitEdgeWriter>>` is a flag-gated optional dependency, which `.claude/rules/architecture.md` rule 2 flags. Annotate the field: `// arch-exempt: optional_arc — flag-gated v2 path, set iff subagent.v2_enabled; removed when v2 becomes default (plan 2026-06-23 cleanup PR)`. **Ripple:** `SubagentSpawnDeps` is a struct-literal with ~15 test construction sites + 1 production site (`ironclaw_reborn/src/runtime.rs:556`). Every site must add `subagent_v2_enabled: false, await_edge_store: None,`. Enumerate them first: `rg -ln "SubagentSpawnDeps" crates/` then grep each file for the struct-literal construction (rustfmt may break `SubagentSpawnDeps {` across lines, so don't rely on a brace-adjacent pattern); update each in this task (mechanical two-field additions: `subagent_v2_enabled: false, await_edge_store: None,`). The production site (`ironclaw_reborn_composition/src/runtime.rs`) is wired for real in Task 1.6c. | |
| - [ ] **Step 3: Write the failing test** — spawn with `subagent_v2_enabled=true` + a captured in-memory `AwaitEdgeWriter`; assert an `open` edge is written for the child. With `false`, assert none. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.md` around
lines 690 - 691, The runtime path reference in the Task 1.6c note is
inconsistent with the rest of the plan and points to the wrong crate. Update the
production-site reference from ironclaw_reborn/src/runtime.rs to
ironclaw_reborn_composition/src/runtime.rs, and keep the surrounding
SubagentSpawnDeps guidance unchanged so contributors implement the wiring in the
correct runtime module.
| - *Boot:* enumerate active (non-terminal) parent runs; for each, `list_dir` its | ||
| edge directory; for each open edge, `get_run_state(child)`; settle the terminal | ||
| ones, close `abandoned` ones whose parent run is terminal. This is driven by | ||
| active runs — **no global recursive scan**. |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Boot recovery algorithm conflicts with the PR1 store contract.
Line 257 says boot recovery enumerates active parent runs, but the PR1 contract defines recovery from list_parents_with_unclosed_edges() in the await-edge store (no active-run query dependency). This mismatch can cause implementation drift and missed recovery cases.
Suggested doc fix
- - *Boot:* enumerate active (non-terminal) parent runs; for each, `list_dir` its
- edge directory; for each open edge, `get_run_state(child)`; settle the terminal
- ones, close `abandoned` ones whose parent run is terminal. This is driven by
- active runs — **no global recursive scan**.
+ - *Boot:* enumerate parents with unclosed await-edges from the await-edge store
+ (e.g., `list_parents_with_unclosed_edges`), then `list_dir` each parent edge
+ directory; for each open edge, `get_run_state(child)`; settle terminal ones,
+ and close `abandoned` when parent is terminal. This is driven by the edge
+ store — **no global recursive scan**.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - *Boot:* enumerate active (non-terminal) parent runs; for each, `list_dir` its | |
| edge directory; for each open edge, `get_run_state(child)`; settle the terminal | |
| ones, close `abandoned` ones whose parent run is terminal. This is driven by | |
| active runs — **no global recursive scan**. | |
| - *Boot:* enumerate parents with unclosed await-edges from the await-edge store | |
| (e.g., `list_parents_with_unclosed_edges`), then `list_dir` each parent edge | |
| directory; for each open edge, `get_run_state(child)`; settle terminal ones, | |
| and close `abandoned` when parent is terminal. This is driven by the edge | |
| store — **no global recursive scan**. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md` around
lines 257 - 260, The boot recovery description conflicts with the PR1 await-edge
store contract by relying on active parent runs instead of the store-driven
recovery entrypoint. Update the recovery spec around the boot algorithm to use
list_parents_with_unclosed_edges() as the source of truth, then for each
returned parent inspect its edge records and reconcile terminal or abandoned
children via get_run_state(child) and edge closure logic. Keep the wording
aligned with the existing await-edge store symbols so the boot flow is clearly
derived from the store contract rather than a separate active-run query.
Summary
Docs-only. Foundational design + implementation plan for the subagent thread harness (Reborn). Reframes subagents from one-shot computations into first-class, addressable, resumable threads that both the parent agent and a human can attach to — driven by four product requirements:
Supersedes the 2026-06-08 subagent-durability sub-spec (removed here) and closes #4656: the SQL gate-resolution stack (3 tables + settlement log + 2-phase ledger + bucketed counter + replay reconciler, ~2,700 LOC) is replaced by a single ScopedFilesystem await-edge file per parent-waits-on-child, with the child's own run record as the source of truth and CAS-based idempotent delivery (no ledger).
terminalbecomes per-run; threads park and re-activate.What's here
docs/superpowers/specs/2026-06-23-subagent-thread-harness-design.md— the single source of truth for the durable delivery/thread design (hardened over 3 adversarial review rounds).docs/superpowers/plans/2026-06-23-subagent-thread-harness-pr0-pr1.md— bite-sized TDD implementation plan for PR 0 (prereqs) + PR 1 (durable await-edge walking skeleton, blocking-only), converged over 7 thermo-nuclear review rounds (clean: no BLOCKER/MAJOR).Key design decisions
activate(thread, input, provenance)primitive.open → settled → drained(+abandonedfor parent-dies-first), CAS transitions = idempotency (no ledger).TurnStatus::BlockedDependentRunwith a real await-edgeGateRef(no new status variant) — a subagent wait reads as an ordinary dependent-run wait.subagent.v2_enabled(default off).Rollout
Vertical walking skeleton: PR 0 (flag, helper visibility, config) → PR 1 (await-edge store + resolver + crash-recovery proof + flag-gated blocking delivery) → later PRs add inspect / extend / human-console / cancel.
No code changes in this PR.
🤖 Generated with Claude Code