feat: add conditional rule logic via <code>when:</code> block (issue #73, PR 1/3) by codesensei-tushar · Pull Request #75 · warestack/watchflow

codesensei-tushar · 2026-04-17T14:40:59Z

feat: add conditional rule logic via `when:` block (issue #73, PR 1/3)

Part of #73

Summary

Rules in .watchflow/rules.yaml can now declare a when: block that gates evaluation. If the predicates don't match, the rule is skipped before any validator or LLM work runs (skip reason is logged).

rules:
  - description: Require Changelog (first-time contributors only)
    event_types: [pull_request]
    parameters:
      changelog_required: true
    when:
      contributor: first_time
      files_match: "src/auth/**"

Supported Predicates (v1)

All optional. Multiple predicates combine with AND.

Predicate	Example	Semantics
contributor: first_time	when: { contributor: first_time }	Zero prior merged PRs
contributor: trusted	when: { contributor: trusted }	≥1 prior merged PR
pr_count_below: N	when: { pr_count_below: 3 }	Fewer than N merged PRs
files_match	when: { files_match: "src/auth/**" }	Changed file matches glob (string or list)

Expression parser (and / or / comparisons) and extended predicates (risk.level, contributor.role, …) are deferred to PRs 2/3 and 3/3.

Summary by CodeRabbit

New Features
- Rules can be conditionally applied using a top-level when: block with predicates: contributor status (first_time | trusted), pr_count_below, and files_match globs/lists. All predicates must hold to apply a rule; skipped rules are logged at debug. Contributor context (login, merged PR count, is_first_time, trusted) is enriched via the GitHub Search API; missing data defaults to permissive (fail-open).
Documentation
- Changelog entry added describing when: support and known future enhancements.

trunk-io · 2026-04-17T14:41:04Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

coderabbitai · 2026-04-17T14:41:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eecbfea0-aad6-42cd-8359-88e6f2ab1a73

📥 Commits

Reviewing files that changed from the base of the PR and between d58ae8f and 8251dc8.

📒 Files selected for processing (1)

src/rules/when_evaluator.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/rules/when_evaluator.py

📝 Walkthrough

Walkthrough

Adds a structured when: predicate block to rules, enriches PR events with contributor context via the GitHub Search API, introduces predicate evaluation (contributor, pr_count_below, files_match) with fail-open semantics, integrates predicate checks into the rule engine to skip non-matching rules and log skip reasons at debug level, and includes unit tests and changelog entry.

Changes

Cohort / File(s)	Summary
Core models `src/rules/models.py`, `src/agents/engine_agent/models.py`	Added `RuleWhen` Pydantic model and an optional `when: RuleWhen
Rule loader `src/rules/loaders/github_loader.py`	Parse optional top-level `when` block into `RuleWhen` when present; log and ignore invalid `when` content.
When evaluator `src/rules/when_evaluator.py`	New `should_apply_rule(when, event_data)` implementing contributor, pr_count_below, and files_match checks with fail-open behavior and explanatory skip reasons.
Engine integration `src/agents/engine_agent/agent.py`, `src/agents/engine_agent/nodes.py`	Propagate `when` into `RuleDescription`; `analyze_rule_descriptions` calls `should_apply_rule` and skips rules that do not apply, logging reason at DEBUG and appending analysis steps.
PR enrichment & GitHub client `src/event_processors/pull_request/enricher.py`, `src/integrations/github/api.py`	Enricher builds `contributor_context` (login, merged_pr_count, is_first_time, trusted) using new `GitHubClient.search_merged_pr_count`; added search method that queries GitHub Search API and returns `total_count` or `None`.
Tests `tests/unit/rules/test_loader_when_block.py`, `tests/unit/rules/test_when_evaluator.py`, `tests/unit/agents/test_engine_agent.py`, `tests/unit/event_processors/pull_request/test_enricher.py`, `tests/unit/integrations/github/test_api.py`	Added comprehensive tests for loader parsing, evaluator semantics (including fail-open cases), engine skipping/logging, enricher behavior, and GitHub Search API interactions.
Docs / Changelog `CHANGELOG.md`	Added Unreleased → Added entry documenting conditional `when:` block, supported predicates, and planned future extensions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Event Source
    participant Enricher as PR Enricher
    participant GitHub as GitHub API
    participant Engine as Rule Engine
    participant Evaluator as When Evaluator

    Client->>Enricher: enrich_event_data(event)
    Enricher->>GitHub: search_merged_pr_count(repo, author)
    GitHub-->>Enricher: merged_pr_count (or None)
    Enricher->>Enricher: attach contributor_context to event_data
    Enricher-->>Client: enriched event_data

    Client->>Engine: analyze_rule_descriptions(state with event_data)
    Engine->>Engine: filter rules by event_type
    Engine->>Evaluator: should_apply_rule(rule.when, event_data)
    Evaluator->>Evaluator: evaluate contributor, pr_count_below, files_match
    Evaluator-->>Engine: (applies: bool, reason: str)

    alt applies == true
        Engine->>Engine: add rule to applicable_rules
    else applies == false
        Engine->>Engine: log "Rule \"<desc>\" skipped: <reason>" (DEBUG)
        Engine->>Engine: append analysis_steps entry
    end

    Engine-->>Client: applicable_rules

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Add conditional rule logic for first-time contributors #73 — Matches implemented conditional rule-gating, contributor lookup, pr_count checks, and file-glob gating described in the issue.

Possibly related PRs

feat: issue 24 diff and codeowners #59 — Touches GitHub API and PR enrichment codepaths; related by overlapping changes to PR enrichment and file fetching.
Matas/fix/fix actions #43 — Overlaps on rule model and engine evaluation pipeline changes (Rule/RuleDescription/engine nodes integration).

Suggested reviewers

dkargatzis

Poem

🐰
I hop through rules with a curious twitch,
When blocks now guide which checks to stitch.
Contributor tales and globs in play,
I skip or run with a joyous sway—
Watchflow clearer with every twitch.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding conditional rule logic via a `when:` block, and correctly references the issue (`#73`) and PR sequence (1/3).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…nknown

codecov-commenter · 2026-04-19T14:28:42Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.15730% with 3 lines in your changes missing coverage. Please review.

❌ Your project status has failed because the head coverage (73.8%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff           @@
##            main     #75     +/-   ##
=======================================
+ Coverage   73.0%   73.8%   +0.8%     
=======================================
  Files        181     184      +3     
  Lines      13481   13831    +350     
=======================================
+ Hits        9851   10221    +370     
+ Misses      3630    3610     -20

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60ae336...8251dc8. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/rules/models.py (1)
33-52: Consider forbidding extra fields on RuleWhen to catch typos and unsupported predicates.

Pydantic v2's default extra="ignore" means misspellings (e.g. files_matches:) or v2/v3 predicates not yet implemented (e.g. risk.level, contributor.role) will be silently dropped by RuleWhen(**when_data) in the loader and the rule will behave as if unrestricted — a potentially unsafe default (stricter checks silently disabled). Given the loader already has a try/except around construction that logs a warning, switching to extra="forbid" would surface these misconfigurations to users on load.
🔧 Proposed change
 class RuleWhen(BaseModel):
     """
     Structured predicate block controlling whether a rule is applied to an event.

     When all predicates evaluate true, the rule runs; otherwise it is skipped.
     An absent or empty block means the rule always runs.
     """

+    model_config = ConfigDict(extra="forbid")
+
     contributor: str | None = Field(
         default=None,
         description="Contributor predicate: 'first_time' (no prior merged PRs) or 'trusted' (has merged PRs).",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/models.py` around lines 33 - 52, The RuleWhen Pydantic model
currently allows unknown fields to be ignored; change its configuration to
forbid extra fields so typos or unsupported predicates raise errors at
construction. In the RuleWhen class add the Pydantic v2 model config (e.g.
model_config = {"extra": "forbid"}) so RuleWhen(...) will raise on unexpected
keys, referencing the RuleWhen class and its existing fields (contributor,
pr_count_below, files_match) when making the change.
src/rules/when_evaluator.py (1)
55-63: Minor: pr_count_below branch ignores contributor_ctx presence but not merged_pr_count absence symmetry.

When only pr_count_below is set and contributor_context is present but lacks merged_pr_count (e.g., legacy/custom enrichers), contributor_ctx.get("merged_pr_count") returns None and the branch correctly fail-opens. Behavior is fine; just suggesting to keep this invariant documented so future predicates (pr_count_above, etc.) follow the same convention: predicate present + data unknown ⇒ apply rule.

Also, reason field in the skip string would benefit from naming the subject (login) for downstream log clarity:
-                return False, f"contributor has {merged_count} merged PRs (threshold: {when.pr_count_below})"
+                login = contributor_ctx.get("login", "contributor")
+                return False, f"{login} has {merged_count} merged PRs (threshold: {when.pr_count_below})"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/when_evaluator.py` around lines 55 - 63, Ensure predicates like
pr_count_below follow the invariant "predicate present + data unknown ⇒ apply
rule" consistently across other predicates (e.g., pr_count_above) by keeping the
same fail-open behavior when contributor_ctx exists but merged_pr_count is None;
also change the skip/reason string returned by the pr_count_below check in
when_evaluator.py (the branch that currently returns False, f"contributor has
{merged_count} merged PRs (threshold: {when.pr_count_below})") to include the
contributor identifier, e.g. use contributor_ctx.get("login") in the message so
it reads something like "contributor {login} has {merged_count} merged PRs
(threshold: ...)" to improve downstream log clarity.
src/integrations/github/api.py (1)
1108-1142: Consider Search API rate-limit awareness and structured logging.

Two small observations on search_merged_pr_count:

The GitHub Search API has a much stricter secondary rate limit (~30 req/min even when authenticated) than the core API. Since this is invoked on every PR event, a busy repo could hit secondary limits and silently return None for all contributors, which flips every newcomer rule to fail-open. Consider caching per (repo, username) for the life of an event (or short TTL) and/or distinguishing 403/429 from other errors so they can be surfaced/alerted.

The warning log uses ad-hoc fields; per the structured-logging guideline, prefer operation, subject_ids, decision, latency_ms at external-call boundaries for consistency with the rest of this module (e.g., get_repository_tree).
♻️ Suggested logging alignment
-                logger.warning(
-                    "search_merged_pr_count failed",
-                    repo=repo,
-                    username=username,
-                    status=response.status,
-                    response=error_text[:200],
-                )
+                logger.warning(
+                    "search_merged_pr_count",
+                    operation="search_merged_pr_count",
+                    subject_ids={"repo": repo, "username": username},
+                    decision=f"http_error_{response.status}",
+                    response=error_text[:200],
+                )
As per coding guidelines: "Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/integrations/github/api.py` around lines 1108 - 1142, The
search_merged_pr_count function should handle Search API secondary rate limits
and use structured logging: detect 403/429 responses from session.get (in
search_merged_pr_count) and return None but log them distinctly (include
operation="search_merged_pr_count", subject_ids={"repo": repo, "username":
username}, decision="rate_limited" or other decision, and latency_ms), while
other errors use decision="error" or "no_data"; also implement a short-lived
cache keyed by (repo, username) (or event-scoped cache) to avoid calling
get_installation_access_token/_get_session repeatedly for the same pair during
an event and to reduce hitting the ~30 req/min secondary limit. Ensure you still
return int when status 200, preserve existing None behavior for failures, and
add structured logger calls referencing logger used in this module.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rules/when_evaluator.py`:
- Around line 65-70: Replace fnmatch-based matching with gitignore-style
matching using pathspec: keep the existing conversion of when.files_match into
patterns and the filenames list from changed_files, then import pathspec and
build a PathSpec from the patterns via PathSpec.from_lines("gitwildmatch",
patterns) and use spec.match_file(name) (or spec.match_files) to test whether
any filename matches; return False with the same message if none match. Update
the matching expression that currently uses fnmatch.fnmatch(name, pat) to use
the pathspec spec, and ensure imports and error message (patterns variable)
remain correct.

---

Nitpick comments:
In `@src/integrations/github/api.py`:
- Around line 1108-1142: The search_merged_pr_count function should handle
Search API secondary rate limits and use structured logging: detect 403/429
responses from session.get (in search_merged_pr_count) and return None but log
them distinctly (include operation="search_merged_pr_count",
subject_ids={"repo": repo, "username": username}, decision="rate_limited" or
other decision, and latency_ms), while other errors use decision="error" or
"no_data"; also implement a short-lived cache keyed by (repo, username) (or
event-scoped cache) to avoid calling get_installation_access_token/_get_session
repeatedly for the same pair during an event and to reduce hitting the ~30
req/min secondary limit. Ensure you still return int when status 200, preserve
existing None behavior for failures, and add structured logger calls referencing
logger used in this module.

In `@src/rules/models.py`:
- Around line 33-52: The RuleWhen Pydantic model currently allows unknown fields
to be ignored; change its configuration to forbid extra fields so typos or
unsupported predicates raise errors at construction. In the RuleWhen class add
the Pydantic v2 model config (e.g. model_config = {"extra": "forbid"}) so
RuleWhen(...) will raise on unexpected keys, referencing the RuleWhen class and
its existing fields (contributor, pr_count_below, files_match) when making the
change.

In `@src/rules/when_evaluator.py`:
- Around line 55-63: Ensure predicates like pr_count_below follow the invariant
"predicate present + data unknown ⇒ apply rule" consistently across other
predicates (e.g., pr_count_above) by keeping the same fail-open behavior when
contributor_ctx exists but merged_pr_count is None; also change the skip/reason
string returned by the pr_count_below check in when_evaluator.py (the branch
that currently returns False, f"contributor has {merged_count} merged PRs
(threshold: {when.pr_count_below})") to include the contributor identifier, e.g.
use contributor_ctx.get("login") in the message so it reads something like
"contributor {login} has {merged_count} merged PRs (threshold: ...)" to improve
downstream log clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b66420f-4781-41eb-becb-c610d1d47f1a

📥 Commits

Reviewing files that changed from the base of the PR and between 60ae336 and d58ae8f.

📒 Files selected for processing (14)

CHANGELOG.md
src/agents/engine_agent/agent.py
src/agents/engine_agent/models.py
src/agents/engine_agent/nodes.py
src/event_processors/pull_request/enricher.py
src/integrations/github/api.py
src/rules/loaders/github_loader.py
src/rules/models.py
src/rules/when_evaluator.py
tests/unit/agents/test_engine_agent.py
tests/unit/event_processors/pull_request/test_enricher.py
tests/unit/integrations/github/test_api.py
tests/unit/rules/test_loader_when_block.py
tests/unit/rules/test_when_evaluator.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

src/agents/engine_agent/agent.py
src/agents/engine_agent/nodes.py
src/rules/models.py
src/rules/loaders/github_loader.py
src/agents/engine_agent/models.py
tests/unit/integrations/github/test_api.py
tests/unit/agents/test_engine_agent.py
tests/unit/event_processors/pull_request/test_enricher.py
src/integrations/github/api.py
src/event_processors/pull_request/enricher.py
tests/unit/rules/test_loader_when_block.py
src/rules/when_evaluator.py
tests/unit/rules/test_when_evaluator.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

tests/unit/integrations/github/test_api.py
tests/unit/agents/test_engine_agent.py
tests/unit/event_processors/pull_request/test_enricher.py
tests/unit/rules/test_loader_when_block.py
tests/unit/rules/test_when_evaluator.py

🧠 Learnings (2)

📚 Learning: 2026-03-27T12:52:44.067Z

Learnt from: oleksii-quinta
Repo: warestack/watchflow PR: 67
File: src/webhooks/handlers/issue_comment.py:153-159
Timestamp: 2026-03-27T12:52:44.067Z
Learning: When enqueuing processor tasks using `task_queue`, follow the documented pre-built-task pattern: (1) build the task with `pre_built_task = task_queue.build_task(event_type, payload, processor.process, delivery_id=...)`; (2) call `task_queue.enqueue(processor.process, event_type, payload, pre_built_task, delivery_id=...)` by passing the pre-built task as a single positional `*args` element; (3) ensure the worker ultimately calls `await processor.process(pre_built_task)` (i.e., the processor `process(self, task: Task)` receives the `Task` instance). This matches the expectation that `enqueue` stores the pre-built task in the wrapper Task’s `args` as described by `build_task`’s docstring (“pass as single arg to enqueue”).

Applied to files:

src/agents/engine_agent/agent.py
src/agents/engine_agent/nodes.py
src/rules/models.py
src/rules/loaders/github_loader.py
src/agents/engine_agent/models.py
src/integrations/github/api.py
src/event_processors/pull_request/enricher.py
src/rules/when_evaluator.py

📚 Learning: 2026-01-31T19:35:22.504Z

Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

tests/unit/agents/test_engine_agent.py
tests/unit/rules/test_loader_when_block.py
tests/unit/rules/test_when_evaluator.py

🔇 Additional comments (12)

src/agents/engine_agent/agent.py (1)

206-232: LGTM — when propagation is consistent across Rule objects and legacy dicts.

Explicitly defaulting when = None for the dict branch keeps RuleDescription.when uniformly present regardless of input shape.

src/agents/engine_agent/nodes.py (1)

40-56: LGTM — correct short-circuit ordering and skip logging.

Event-type filter runs first (cheapest), then should_apply_rule is invoked with rule_desc.when (safe for None, returns (True, "")). Skip reason is both logged at debug and recorded in analysis_steps for observability.

CHANGELOG.md (1)

9-23: LGTM.

Entry accurately describes the supported predicates, AND semantics, skip/log behavior, and defers expression parsing/extended predicates to follow-up PRs.

src/rules/models.py (1)

85-88: LGTM.

New optional when field is properly typed and defaulted, preserving backward compatibility with existing rules.

src/rules/loaders/github_loader.py (1)

116-131: LGTM — defensive parsing with clear, contextual warnings.

Both the non-mapping and validation-error paths log the offending rule's description and leave when_block = None, so a malformed when: block degrades gracefully to "rule always runs" rather than failing the whole load. If you adopt extra="forbid" on RuleWhen (see suggestion on src/rules/models.py), this try/except will also catch typo'd predicate keys.

tests/unit/integrations/github/test_api.py (1)

324-398: LGTM — solid coverage of search_merged_pr_count branches.

Happy-path total extraction, zero result, 403 rate-limit, 5xx, missing installation token (with get.assert_not_called()), and generic exception all covered. The URL-encoding assertions (is%3Apr, repo%3Aowner/repo, author%3Aalice) pin the query shape that when_evaluator relies on.

src/agents/engine_agent/models.py (1)

14-14: LGTM.

RuleWhen import and new when field on RuleDescription mirror Rule.when and cleanly plumb the predicate through to nodes.should_apply_rule.

Also applies to: 114-116

tests/unit/agents/test_engine_agent.py (1)

158-226: LGTM — tests assert both behavioral skip and observability.

test_engine_skips_rule_when_when_block_does_not_match proves conditions on a gated rule are not evaluated while an unconditional rule still runs, and test_engine_logs_rule_skip_with_description_and_reason pins the exact skip-log format. Consider also adding a positive-gating test (e.g. is_first_time: True so the first-time rule does fire) to guard against an inverted predicate regression, but not required.

tests/unit/event_processors/pull_request/test_enricher.py (1)

53-142: Good coverage of enrichment fail-open paths.

Tests correctly exercise the success, zero-count (first-time), None return, exception, and legacy-client-without-method branches of _build_contributor_context, matching the fail-open contract expected by when_evaluator.

src/event_processors/pull_request/enricher.py (1)

99-145: Contributor context enrichment looks correct.

hasattr guard for legacy clients, try/except narrowed around the external call, and the derived booleans correctly treat merged_count=None as both is_first_time=False and trusted=False (via short-circuit in bool(merged_count and ...)), which lines up with the evaluator's fail-open semantics.

One minor note: if author_login is absent (bot-authored PRs with no user.login, or deleted users), contributor_context will be missing from event_data entirely. The evaluator already fail-opens for missing context, so this is fine — just worth flagging that bot PRs and rules gated on contributor: first_time will apply rather than skip. Confirm that matches intent.

tests/unit/rules/test_loader_when_block.py (1)

1-120: Thorough loader coverage.

Cases for absent/null/non-mapping/invalid-type when: plus valid predicate shapes (string, list, combined) line up with _parse_rule's behavior of setting when=None on any parse failure and logging a warning.

tests/unit/rules/test_when_evaluator.py (1)

1-167: Comprehensive predicate coverage.

Good mix of positive/negative matches, boundary at pr_count_below=3 with merged=3, fail-open for missing/None context, combined-predicate AND semantics, and non-dict changed_files robustness. Matches the evaluator contract.

watchflow · 2026-04-19T14:34:38Z

🛡️ Watchflow Governance Checks

Status: ❌ 1 Violations Found

🟡 Medium Severity (1)

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Pull request exceeds maximum lines changed (751 > 500)
How to fix: Reduce the size of this PR to at most 500 lines changed (additions + deletions).

💡 Reply with @watchflow ack [reason] to override these rules, or @watchflow help for commands.

Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily.

watchflow · 2026-04-20T04:38:51Z

🛡️ Watchflow Governance Checks

Status: ❌ 1 Violations Found

🟡 Medium Severity (1)

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Pull request exceeds maximum lines changed (753 > 500)
How to fix: Reduce the size of this PR to at most 500 lines changed (additions + deletions).

💡 Reply with @watchflow ack [reason] to override these rules, or @watchflow help for commands.

Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily.

codesensei-tushar · 2026-04-20T12:58:20Z

`when:` block — end-to-end sandbox validation

Validated conditional rule applicability on this branch against a live GitHub App installation. All three
predicate families (contributor, pr_count_below, files_match) behave as designed, including the AND
composition of multiple predicates.

Test repo: codesensei-tushar/watchflow-sandbox

Setup

A 7-rule .watchflow/rules.yaml covering every predicate permutation (first-time, trusted, pr_count_below,
single glob, list of globs, combined AND). Two contributors differing only in their per-repo merged-PR count:

tushar-u (burner, PR from fork): merged_pr_count = 0 → is_first_time = true
codesensei-tushar (repo owner): flipped 0 → 1 mid-test via a clean merge (PR
#5) to become trusted, then further bumped 1 → 3 via PR #8 and PR
#9 to cross the pr_count_below: 3 threshold

Test matrix

Case	PR	Author state
Green baseline, small first-time edit	#4
first-time	pass, 0 violations — 3 of 7 rules applicable, rest correctly skipped
Clean trusted-flip PR (merged)	#5
first-time → trusted	pass, 0 violations, merged
"Break everything" from owner	#6	trusted
fail, 4 violations — first-time rules skipped, trusted CHANGELOG rule fired
"Break everything" from burner	#7
first-time	fail, 5 violations — first-time rules fired, trusted rule skipped
Large clean PR after `merged_pr_count = 3`
#10	trusted, 3 merged	pass, 0 violations —
`max_pr_loc` rule skipped by `when: pr_count_below: 3` despite 600 LOC

Side-by-side proof

PRs #6 and
#7 have identical change surfaces (600-line
src/big.py, non-conventional title, 3-char body, .github/workflows/* + docs/* additions). The only
difference is the author's per-repo merged_pr_count. Violations diverge precisely where when: contributor
gates it. PR #10 adds a third column showing
the pr_count_below: 3 gate closing once the author has 3 merged PRs.

Rule	PR #7 (first-time, 0 merged)	PR #6 (trusted, 1 merged)	PR #10 (trusted, 3 merged)
`require_linked_issue` (no `when:`)	fired	fired	applied, passed
First-time title + desc (`when: contributor: first_time`)	fired	skipped	skipped
Trusted CHANGELOG (`when: contributor: trusted`)	skipped	fired	applied, passed
`max_pr_loc` (`when: pr_count_below: 3`)	fired	fired	skipped
`docs/` title (`when: files_match: docs/`)	fired	fired	skipped (no docs/ changes)
`src/**` codeowner (`when: files_match: [...]`)	applied, passed	applied, passed	applied, passed
Combined `first_time + src/**`	applied, passed	skipped	skipped

The diagonal inversion (rows 2 and 3) is the direct observable signal that contributor: gating works
end-to-end through the enricher → when_evaluator → condition pipeline. Row 4 proves the same for
pr_count_below: — server logs for PR #10 show exactly 3 rules marked applicable, with max_pr_loc absent from
the applicability list before condition evaluation ever ran.

Verdict

when: block on feat/first-time-contributor-rules is functionally correct. All three predicate types and
their AND composition behave exactly as documented in src/rules/when_evaluator.py.

codesensei-tushar added 5 commits April 17, 2026 20:05

feat(rules): add when: block to gate rule applicability

e701776

feat(enricher): fetch contributor PR history for when: predicates

56f9d5d

feat(engine): skip rules whose when: predicates do not hold

5ccf6a7

test: cover when: predicate evaluation and contributor context

526f23c

docs: changelog entry for conditional when: rules

8fc4a24

codesensei-tushar added 3 commits April 19, 2026 19:27

fix(rules): fail-open contributor predicate when merged_pr_count is u…

c859d5b

…nknown

test: edge cases for when: evaluator and contributor-context failures

2f40743

test: broaden when: coverage and search_merged_pr_count HTTP branches

d58ae8f

codesensei-tushar marked this pull request as ready for review April 19, 2026 14:28

codesensei-tushar requested a review from dkargatzis as a code owner April 19, 2026 14:28

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread src/rules/when_evaluator.py

chore(rules): TODO to swap fnmatch for pathspec gitwildmatch

8251dc8

Conversation

codesensei-tushar commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: add conditional rule logic via when: block (issue #73, PR 1/3)

Summary

Supported Predicates (v1)

Summary by CodeRabbit

Uh oh!

trunk-io Bot commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

watchflow Bot commented Apr 19, 2026

🛡️ Watchflow Governance Checks

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Uh oh!

watchflow Bot commented Apr 20, 2026

🛡️ Watchflow Governance Checks

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Uh oh!

codesensei-tushar commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

when: block — end-to-end sandbox validation

Setup

Test matrix

Side-by-side proof

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codesensei-tushar commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

feat: add conditional rule logic via `when:` block (issue #73, PR 1/3)

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

codecov-commenter commented Apr 19, 2026 •

edited

Loading

codesensei-tushar commented Apr 20, 2026 •

edited

Loading

`when:` block — end-to-end sandbox validation