feat: add conditional rule logic via <code>when:</code> block (issue #73, PR 1/3)#75
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a structured Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Event Source
participant Enricher as PR Enricher
participant GitHub as GitHub API
participant Engine as Rule Engine
participant Evaluator as When Evaluator
Client->>Enricher: enrich_event_data(event)
Enricher->>GitHub: search_merged_pr_count(repo, author)
GitHub-->>Enricher: merged_pr_count (or None)
Enricher->>Enricher: attach contributor_context to event_data
Enricher-->>Client: enriched event_data
Client->>Engine: analyze_rule_descriptions(state with event_data)
Engine->>Engine: filter rules by event_type
Engine->>Evaluator: should_apply_rule(rule.when, event_data)
Evaluator->>Evaluator: evaluate contributor, pr_count_below, files_match
Evaluator-->>Engine: (applies: bool, reason: str)
alt applies == true
Engine->>Engine: add rule to applicable_rules
else applies == false
Engine->>Engine: log "Rule \"<desc>\" skipped: <reason>" (DEBUG)
Engine->>Engine: append analysis_steps entry
end
Engine-->>Client: applicable_rules
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (73.8%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage. @@ Coverage Diff @@
## main #75 +/- ##
=======================================
+ Coverage 73.0% 73.8% +0.8%
=======================================
Files 181 184 +3
Lines 13481 13831 +350
=======================================
+ Hits 9851 10221 +370
+ Misses 3630 3610 -20 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/rules/models.py (1)
33-52: Consider forbidding extra fields onRuleWhento catch typos and unsupported predicates.Pydantic v2's default
extra="ignore"means misspellings (e.g.files_matches:) or v2/v3 predicates not yet implemented (e.g.risk.level,contributor.role) will be silently dropped byRuleWhen(**when_data)in the loader and the rule will behave as if unrestricted — a potentially unsafe default (stricter checks silently disabled). Given the loader already has atry/exceptaround construction that logs a warning, switching toextra="forbid"would surface these misconfigurations to users on load.🔧 Proposed change
class RuleWhen(BaseModel): """ Structured predicate block controlling whether a rule is applied to an event. When all predicates evaluate true, the rule runs; otherwise it is skipped. An absent or empty block means the rule always runs. """ + model_config = ConfigDict(extra="forbid") + contributor: str | None = Field( default=None, description="Contributor predicate: 'first_time' (no prior merged PRs) or 'trusted' (has merged PRs).", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rules/models.py` around lines 33 - 52, The RuleWhen Pydantic model currently allows unknown fields to be ignored; change its configuration to forbid extra fields so typos or unsupported predicates raise errors at construction. In the RuleWhen class add the Pydantic v2 model config (e.g. model_config = {"extra": "forbid"}) so RuleWhen(...) will raise on unexpected keys, referencing the RuleWhen class and its existing fields (contributor, pr_count_below, files_match) when making the change.src/rules/when_evaluator.py (1)
55-63: Minor:pr_count_belowbranch ignorescontributor_ctxpresence but notmerged_pr_countabsence symmetry.When only
pr_count_belowis set andcontributor_contextis present but lacksmerged_pr_count(e.g., legacy/custom enrichers),contributor_ctx.get("merged_pr_count")returnsNoneand the branch correctly fail-opens. Behavior is fine; just suggesting to keep this invariant documented so future predicates (pr_count_above, etc.) follow the same convention: predicate present + data unknown ⇒ apply rule.Also,
reasonfield in the skip string would benefit from naming the subject (login) for downstream log clarity:- return False, f"contributor has {merged_count} merged PRs (threshold: {when.pr_count_below})" + login = contributor_ctx.get("login", "contributor") + return False, f"{login} has {merged_count} merged PRs (threshold: {when.pr_count_below})"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rules/when_evaluator.py` around lines 55 - 63, Ensure predicates like pr_count_below follow the invariant "predicate present + data unknown ⇒ apply rule" consistently across other predicates (e.g., pr_count_above) by keeping the same fail-open behavior when contributor_ctx exists but merged_pr_count is None; also change the skip/reason string returned by the pr_count_below check in when_evaluator.py (the branch that currently returns False, f"contributor has {merged_count} merged PRs (threshold: {when.pr_count_below})") to include the contributor identifier, e.g. use contributor_ctx.get("login") in the message so it reads something like "contributor {login} has {merged_count} merged PRs (threshold: ...)" to improve downstream log clarity.src/integrations/github/api.py (1)
1108-1142: Consider Search API rate-limit awareness and structured logging.Two small observations on
search_merged_pr_count:
- The GitHub Search API has a much stricter secondary rate limit (~30 req/min even when authenticated) than the core API. Since this is invoked on every PR event, a busy repo could hit secondary limits and silently return
Nonefor all contributors, which flips every newcomer rule to fail-open. Consider caching per(repo, username)for the life of an event (or short TTL) and/or distinguishing 403/429 from other errors so they can be surfaced/alerted.- The warning log uses ad-hoc fields; per the structured-logging guideline, prefer
operation,subject_ids,decision,latency_msat external-call boundaries for consistency with the rest of this module (e.g.,get_repository_tree).♻️ Suggested logging alignment
- logger.warning( - "search_merged_pr_count failed", - repo=repo, - username=username, - status=response.status, - response=error_text[:200], - ) + logger.warning( + "search_merged_pr_count", + operation="search_merged_pr_count", + subject_ids={"repo": repo, "username": username}, + decision=f"http_error_{response.status}", + response=error_text[:200], + )As per coding guidelines: "Use structured logging at boundaries with fields:
operation,subject_ids,decision,latency_ms".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/integrations/github/api.py` around lines 1108 - 1142, The search_merged_pr_count function should handle Search API secondary rate limits and use structured logging: detect 403/429 responses from session.get (in search_merged_pr_count) and return None but log them distinctly (include operation="search_merged_pr_count", subject_ids={"repo": repo, "username": username}, decision="rate_limited" or other decision, and latency_ms), while other errors use decision="error" or "no_data"; also implement a short-lived cache keyed by (repo, username) (or event-scoped cache) to avoid calling get_installation_access_token/_get_session repeatedly for the same pair during an event and to reduce hitting the ~30 req/min secondary limit. Ensure you still return int when status 200, preserve existing None behavior for failures, and add structured logger calls referencing logger used in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/rules/when_evaluator.py`:
- Around line 65-70: Replace fnmatch-based matching with gitignore-style
matching using pathspec: keep the existing conversion of when.files_match into
patterns and the filenames list from changed_files, then import pathspec and
build a PathSpec from the patterns via PathSpec.from_lines("gitwildmatch",
patterns) and use spec.match_file(name) (or spec.match_files) to test whether
any filename matches; return False with the same message if none match. Update
the matching expression that currently uses fnmatch.fnmatch(name, pat) to use
the pathspec spec, and ensure imports and error message (patterns variable)
remain correct.
---
Nitpick comments:
In `@src/integrations/github/api.py`:
- Around line 1108-1142: The search_merged_pr_count function should handle
Search API secondary rate limits and use structured logging: detect 403/429
responses from session.get (in search_merged_pr_count) and return None but log
them distinctly (include operation="search_merged_pr_count",
subject_ids={"repo": repo, "username": username}, decision="rate_limited" or
other decision, and latency_ms), while other errors use decision="error" or
"no_data"; also implement a short-lived cache keyed by (repo, username) (or
event-scoped cache) to avoid calling get_installation_access_token/_get_session
repeatedly for the same pair during an event and to reduce hitting the ~30
req/min secondary limit. Ensure you still return int when status 200, preserve
existing None behavior for failures, and add structured logger calls referencing
logger used in this module.
In `@src/rules/models.py`:
- Around line 33-52: The RuleWhen Pydantic model currently allows unknown fields
to be ignored; change its configuration to forbid extra fields so typos or
unsupported predicates raise errors at construction. In the RuleWhen class add
the Pydantic v2 model config (e.g. model_config = {"extra": "forbid"}) so
RuleWhen(...) will raise on unexpected keys, referencing the RuleWhen class and
its existing fields (contributor, pr_count_below, files_match) when making the
change.
In `@src/rules/when_evaluator.py`:
- Around line 55-63: Ensure predicates like pr_count_below follow the invariant
"predicate present + data unknown ⇒ apply rule" consistently across other
predicates (e.g., pr_count_above) by keeping the same fail-open behavior when
contributor_ctx exists but merged_pr_count is None; also change the skip/reason
string returned by the pr_count_below check in when_evaluator.py (the branch
that currently returns False, f"contributor has {merged_count} merged PRs
(threshold: {when.pr_count_below})") to include the contributor identifier, e.g.
use contributor_ctx.get("login") in the message so it reads something like
"contributor {login} has {merged_count} merged PRs (threshold: ...)" to improve
downstream log clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6b66420f-4781-41eb-becb-c610d1d47f1a
📒 Files selected for processing (14)
CHANGELOG.mdsrc/agents/engine_agent/agent.pysrc/agents/engine_agent/models.pysrc/agents/engine_agent/nodes.pysrc/event_processors/pull_request/enricher.pysrc/integrations/github/api.pysrc/rules/loaders/github_loader.pysrc/rules/models.pysrc/rules/when_evaluator.pytests/unit/agents/test_engine_agent.pytests/unit/event_processors/pull_request/test_enricher.pytests/unit/integrations/github/test_api.pytests/unit/rules/test_loader_when_block.pytests/unit/rules/test_when_evaluator.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)
**/*.py: Use modern typing only:dict[str, Any],list[str],str | None(noDict,List,Optional)
GitHub/HTTP/DB calls must beasync def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validatedBaseModelfrom Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields:operation,subject_ids,decision,latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include:decision,confidence (0..1), shortreasoning,recommendations,strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out withasyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g.,AgentError) witherror_type,message,context,timestamp,retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict,List,Optional) in code review
Reject blocking calls in async code; reject bareexcept:clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...
Files:
src/agents/engine_agent/agent.pysrc/agents/engine_agent/nodes.pysrc/rules/models.pysrc/rules/loaders/github_loader.pysrc/agents/engine_agent/models.pytests/unit/integrations/github/test_api.pytests/unit/agents/test_engine_agent.pytests/unit/event_processors/pull_request/test_enricher.pysrc/integrations/github/api.pysrc/event_processors/pull_request/enricher.pytests/unit/rules/test_loader_when_block.pysrc/rules/when_evaluator.pytests/unit/rules/test_when_evaluator.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)
tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Usepytest.mark.asynciofor async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green
Files:
tests/unit/integrations/github/test_api.pytests/unit/agents/test_engine_agent.pytests/unit/event_processors/pull_request/test_enricher.pytests/unit/rules/test_loader_when_block.pytests/unit/rules/test_when_evaluator.py
🧠 Learnings (2)
📚 Learning: 2026-03-27T12:52:44.067Z
Learnt from: oleksii-quinta
Repo: warestack/watchflow PR: 67
File: src/webhooks/handlers/issue_comment.py:153-159
Timestamp: 2026-03-27T12:52:44.067Z
Learning: When enqueuing processor tasks using `task_queue`, follow the documented pre-built-task pattern: (1) build the task with `pre_built_task = task_queue.build_task(event_type, payload, processor.process, delivery_id=...)`; (2) call `task_queue.enqueue(processor.process, event_type, payload, pre_built_task, delivery_id=...)` by passing the pre-built task as a single positional `*args` element; (3) ensure the worker ultimately calls `await processor.process(pre_built_task)` (i.e., the processor `process(self, task: Task)` receives the `Task` instance). This matches the expectation that `enqueue` stores the pre-built task in the wrapper Task’s `args` as described by `build_task`’s docstring (“pass as single arg to enqueue”).
Applied to files:
src/agents/engine_agent/agent.pysrc/agents/engine_agent/nodes.pysrc/rules/models.pysrc/rules/loaders/github_loader.pysrc/agents/engine_agent/models.pysrc/integrations/github/api.pysrc/event_processors/pull_request/enricher.pysrc/rules/when_evaluator.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Applied to files:
tests/unit/agents/test_engine_agent.pytests/unit/rules/test_loader_when_block.pytests/unit/rules/test_when_evaluator.py
🔇 Additional comments (12)
src/agents/engine_agent/agent.py (1)
206-232: LGTM —whenpropagation is consistent across Rule objects and legacy dicts.Explicitly defaulting
when = Nonefor the dict branch keepsRuleDescription.whenuniformly present regardless of input shape.src/agents/engine_agent/nodes.py (1)
40-56: LGTM — correct short-circuit ordering and skip logging.Event-type filter runs first (cheapest), then
should_apply_ruleis invoked withrule_desc.when(safe forNone, returns(True, "")). Skip reason is both logged at debug and recorded inanalysis_stepsfor observability.CHANGELOG.md (1)
9-23: LGTM.Entry accurately describes the supported predicates, AND semantics, skip/log behavior, and defers expression parsing/extended predicates to follow-up PRs.
src/rules/models.py (1)
85-88: LGTM.New optional
whenfield is properly typed and defaulted, preserving backward compatibility with existing rules.src/rules/loaders/github_loader.py (1)
116-131: LGTM — defensive parsing with clear, contextual warnings.Both the non-mapping and validation-error paths log the offending rule's description and leave
when_block = None, so a malformedwhen:block degrades gracefully to "rule always runs" rather than failing the whole load. If you adoptextra="forbid"onRuleWhen(see suggestion onsrc/rules/models.py), thistry/exceptwill also catch typo'd predicate keys.tests/unit/integrations/github/test_api.py (1)
324-398: LGTM — solid coverage ofsearch_merged_pr_countbranches.Happy-path total extraction, zero result, 403 rate-limit, 5xx, missing installation token (with
get.assert_not_called()), and generic exception all covered. The URL-encoding assertions (is%3Apr,repo%3Aowner/repo,author%3Aalice) pin the query shape thatwhen_evaluatorrelies on.src/agents/engine_agent/models.py (1)
14-14: LGTM.
RuleWhenimport and newwhenfield onRuleDescriptionmirrorRule.whenand cleanly plumb the predicate through tonodes.should_apply_rule.Also applies to: 114-116
tests/unit/agents/test_engine_agent.py (1)
158-226: LGTM — tests assert both behavioral skip and observability.
test_engine_skips_rule_when_when_block_does_not_matchproves conditions on a gated rule are not evaluated while an unconditional rule still runs, andtest_engine_logs_rule_skip_with_description_and_reasonpins the exact skip-log format. Consider also adding a positive-gating test (e.g.is_first_time: Trueso the first-time rule does fire) to guard against an inverted predicate regression, but not required.tests/unit/event_processors/pull_request/test_enricher.py (1)
53-142: Good coverage of enrichment fail-open paths.Tests correctly exercise the success, zero-count (first-time),
Nonereturn, exception, and legacy-client-without-method branches of_build_contributor_context, matching the fail-open contract expected bywhen_evaluator.src/event_processors/pull_request/enricher.py (1)
99-145: Contributor context enrichment looks correct.
hasattrguard for legacy clients, try/except narrowed around the external call, and the derived booleans correctly treatmerged_count=Noneas bothis_first_time=Falseandtrusted=False(via short-circuit inbool(merged_count and ...)), which lines up with the evaluator's fail-open semantics.One minor note: if
author_loginis absent (bot-authored PRs with nouser.login, or deleted users),contributor_contextwill be missing fromevent_dataentirely. The evaluator already fail-opens for missing context, so this is fine — just worth flagging that bot PRs and rules gated oncontributor: first_timewill apply rather than skip. Confirm that matches intent.tests/unit/rules/test_loader_when_block.py (1)
1-120: Thorough loader coverage.Cases for absent/null/non-mapping/invalid-type
when:plus valid predicate shapes (string, list, combined) line up with_parse_rule's behavior of settingwhen=Noneon any parse failure and logging a warning.tests/unit/rules/test_when_evaluator.py (1)
1-167: Comprehensive predicate coverage.Good mix of positive/negative matches, boundary at
pr_count_below=3withmerged=3, fail-open for missing/Nonecontext, combined-predicate AND semantics, and non-dictchanged_filesrobustness. Matches the evaluator contract.
🛡️ Watchflow Governance ChecksStatus: ❌ 1 Violations Found 🟡 Medium Severity (1)Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.Pull request exceeds maximum lines changed (751 > 500) 💡 Reply with Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily. |
🛡️ Watchflow Governance ChecksStatus: ❌ 1 Violations Found 🟡 Medium Severity (1)Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.Pull request exceeds maximum lines changed (753 > 500) 💡 Reply with Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily. |
|
| Case | PR | Author state | Result |
|---|---|---|---|
| Green baseline, small first-time edit | #4 | ||
| first-time | pass, 0 violations — 3 of 7 rules applicable, rest correctly skipped | ||
| Clean trusted-flip PR (merged) | #5 | ||
| first-time → trusted | pass, 0 violations, merged | ||
| "Break everything" from owner | #6 | trusted | |
| fail, 4 violations — first-time rules skipped, trusted CHANGELOG rule fired | |||
| "Break everything" from burner | #7 | ||
| first-time | fail, 5 violations — first-time rules fired, trusted rule skipped | ||
Large clean PR after merged_pr_count = 3 |
|||
| #10 | trusted, 3 merged | pass, 0 violations — | |
max_pr_loc rule skipped by when: pr_count_below: 3 despite 600 LOC |
Side-by-side proof
PRs #6 and
#7 have identical change surfaces (600-line
src/big.py, non-conventional title, 3-char body, .github/workflows/* + docs/* additions). The only
difference is the author's per-repo merged_pr_count. Violations diverge precisely where when: contributor
gates it. PR #10 adds a third column showing
the pr_count_below: 3 gate closing once the author has 3 merged PRs.
| Rule | PR #7 (first-time, 0 merged) | PR #6 (trusted, 1 merged) | PR #10 (trusted, 3 merged) |
|---|---|---|---|
require_linked_issue (no when:) |
fired | fired | applied, passed |
First-time title + desc (when: contributor: first_time) |
fired | skipped | skipped |
Trusted CHANGELOG (when: contributor: trusted) |
skipped | fired | applied, passed |
max_pr_loc (when: pr_count_below: 3) |
fired | fired | skipped |
docs/* title (when: files_match: docs/*) |
fired | fired | skipped (no docs/ changes) |
src/** codeowner (when: files_match: [...]) |
applied, passed | applied, passed | applied, passed |
Combined first_time + src/** |
applied, passed | skipped | skipped |
The diagonal inversion (rows 2 and 3) is the direct observable signal that contributor: gating works
end-to-end through the enricher → when_evaluator → condition pipeline. Row 4 proves the same for
pr_count_below: — server logs for PR #10 show exactly 3 rules marked applicable, with max_pr_loc absent from
the applicability list before condition evaluation ever ran.
Verdict
when: block on feat/first-time-contributor-rules is functionally correct. All three predicate types and
their AND composition behave exactly as documented in src/rules/when_evaluator.py.
feat: add conditional rule logic via
when:block (issue #73, PR 1/3)Part of #73
Summary
Rules in
.watchflow/rules.yamlcan now declare awhen:block that gates evaluation. If the predicates don't match, the rule is skipped before any validator or LLM work runs (skip reason is logged).Supported Predicates (v1)
All optional. Multiple predicates combine with AND.
Expression parser (
and/or/ comparisons) and extended predicates (risk.level,contributor.role, …) are deferred to PRs 2/3 and 3/3.Summary by CodeRabbit