Skip to content

feat: add conditional rule logic via <code>when:</code> block (issue #73, PR 1/3)#75

Open
codesensei-tushar wants to merge 9 commits intowarestack:mainfrom
codesensei-tushar:feat/first-time-contributor-rules
Open

feat: add conditional rule logic via <code>when:</code> block (issue #73, PR 1/3)#75
codesensei-tushar wants to merge 9 commits intowarestack:mainfrom
codesensei-tushar:feat/first-time-contributor-rules

Conversation

@codesensei-tushar
Copy link
Copy Markdown
Contributor

@codesensei-tushar codesensei-tushar commented Apr 17, 2026

feat: add conditional rule logic via when: block (issue #73, PR 1/3)

Part of #73

Summary

Rules in .watchflow/rules.yaml can now declare a when: block that gates evaluation. If the predicates don't match, the rule is skipped before any validator or LLM work runs (skip reason is logged).

rules:
  - description: Require Changelog (first-time contributors only)
    event_types: [pull_request]
    parameters:
      changelog_required: true
    when:
      contributor: first_time
      files_match: "src/auth/**"

Supported Predicates (v1)

All optional. Multiple predicates combine with AND.

Predicate Example Semantics
contributor: first_time when: { contributor: first_time } Zero prior merged PRs
contributor: trusted when: { contributor: trusted } ≥1 prior merged PR
pr_count_below: N when: { pr_count_below: 3 } Fewer than N merged PRs
files_match when: { files_match: "src/auth/**" } Changed file matches glob (string or list)

Expression parser (and / or / comparisons) and extended predicates (risk.level, contributor.role, …) are deferred to PRs 2/3 and 3/3.

Summary by CodeRabbit

  • New Features
    • Rules can be conditionally applied using a top-level when: block with predicates: contributor status (first_time | trusted), pr_count_below, and files_match globs/lists. All predicates must hold to apply a rule; skipped rules are logged at debug. Contributor context (login, merged PR count, is_first_time, trusted) is enriched via the GitHub Search API; missing data defaults to permissive (fail-open).
  • Documentation
    • Changelog entry added describing when: support and known future enhancements.

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 17, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eecbfea0-aad6-42cd-8359-88e6f2ab1a73

📥 Commits

Reviewing files that changed from the base of the PR and between d58ae8f and 8251dc8.

📒 Files selected for processing (1)
  • src/rules/when_evaluator.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/rules/when_evaluator.py

📝 Walkthrough

Walkthrough

Adds a structured when: predicate block to rules, enriches PR events with contributor context via the GitHub Search API, introduces predicate evaluation (contributor, pr_count_below, files_match) with fail-open semantics, integrates predicate checks into the rule engine to skip non-matching rules and log skip reasons at debug level, and includes unit tests and changelog entry.

Changes

Cohort / File(s) Summary
Core models
src/rules/models.py, src/agents/engine_agent/models.py
Added RuleWhen Pydantic model and an optional `when: RuleWhen
Rule loader
src/rules/loaders/github_loader.py
Parse optional top-level when block into RuleWhen when present; log and ignore invalid when content.
When evaluator
src/rules/when_evaluator.py
New should_apply_rule(when, event_data) implementing contributor, pr_count_below, and files_match checks with fail-open behavior and explanatory skip reasons.
Engine integration
src/agents/engine_agent/agent.py, src/agents/engine_agent/nodes.py
Propagate when into RuleDescription; analyze_rule_descriptions calls should_apply_rule and skips rules that do not apply, logging reason at DEBUG and appending analysis steps.
PR enrichment & GitHub client
src/event_processors/pull_request/enricher.py, src/integrations/github/api.py
Enricher builds contributor_context (login, merged_pr_count, is_first_time, trusted) using new GitHubClient.search_merged_pr_count; added search method that queries GitHub Search API and returns total_count or None.
Tests
tests/unit/rules/test_loader_when_block.py, tests/unit/rules/test_when_evaluator.py, tests/unit/agents/test_engine_agent.py, tests/unit/event_processors/pull_request/test_enricher.py, tests/unit/integrations/github/test_api.py
Added comprehensive tests for loader parsing, evaluator semantics (including fail-open cases), engine skipping/logging, enricher behavior, and GitHub Search API interactions.
Docs / Changelog
CHANGELOG.md
Added Unreleased → Added entry documenting conditional when: block, supported predicates, and planned future extensions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Event Source
    participant Enricher as PR Enricher
    participant GitHub as GitHub API
    participant Engine as Rule Engine
    participant Evaluator as When Evaluator

    Client->>Enricher: enrich_event_data(event)
    Enricher->>GitHub: search_merged_pr_count(repo, author)
    GitHub-->>Enricher: merged_pr_count (or None)
    Enricher->>Enricher: attach contributor_context to event_data
    Enricher-->>Client: enriched event_data

    Client->>Engine: analyze_rule_descriptions(state with event_data)
    Engine->>Engine: filter rules by event_type
    Engine->>Evaluator: should_apply_rule(rule.when, event_data)
    Evaluator->>Evaluator: evaluate contributor, pr_count_below, files_match
    Evaluator-->>Engine: (applies: bool, reason: str)

    alt applies == true
        Engine->>Engine: add rule to applicable_rules
    else applies == false
        Engine->>Engine: log "Rule \"<desc>\" skipped: <reason>" (DEBUG)
        Engine->>Engine: append analysis_steps entry
    end

    Engine-->>Client: applicable_rules
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • dkargatzis

Poem

🐰
I hop through rules with a curious twitch,
When blocks now guide which checks to stitch.
Contributor tales and globs in play,
I skip or run with a joyous sway—
Watchflow clearer with every twitch.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding conditional rule logic via a when: block, and correctly references the issue (#73) and PR sequence (1/3).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 19, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.15730% with 3 lines in your changes missing coverage. Please review.

❌ Your project status has failed because the head coverage (73.8%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff           @@
##            main     #75     +/-   ##
=======================================
+ Coverage   73.0%   73.8%   +0.8%     
=======================================
  Files        181     184      +3     
  Lines      13481   13831    +350     
=======================================
+ Hits        9851   10221    +370     
+ Misses      3630    3610     -20     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60ae336...8251dc8. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codesensei-tushar codesensei-tushar marked this pull request as ready for review April 19, 2026 14:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/rules/models.py (1)

33-52: Consider forbidding extra fields on RuleWhen to catch typos and unsupported predicates.

Pydantic v2's default extra="ignore" means misspellings (e.g. files_matches:) or v2/v3 predicates not yet implemented (e.g. risk.level, contributor.role) will be silently dropped by RuleWhen(**when_data) in the loader and the rule will behave as if unrestricted — a potentially unsafe default (stricter checks silently disabled). Given the loader already has a try/except around construction that logs a warning, switching to extra="forbid" would surface these misconfigurations to users on load.

🔧 Proposed change
 class RuleWhen(BaseModel):
     """
     Structured predicate block controlling whether a rule is applied to an event.

     When all predicates evaluate true, the rule runs; otherwise it is skipped.
     An absent or empty block means the rule always runs.
     """

+    model_config = ConfigDict(extra="forbid")
+
     contributor: str | None = Field(
         default=None,
         description="Contributor predicate: 'first_time' (no prior merged PRs) or 'trusted' (has merged PRs).",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/models.py` around lines 33 - 52, The RuleWhen Pydantic model
currently allows unknown fields to be ignored; change its configuration to
forbid extra fields so typos or unsupported predicates raise errors at
construction. In the RuleWhen class add the Pydantic v2 model config (e.g.
model_config = {"extra": "forbid"}) so RuleWhen(...) will raise on unexpected
keys, referencing the RuleWhen class and its existing fields (contributor,
pr_count_below, files_match) when making the change.
src/rules/when_evaluator.py (1)

55-63: Minor: pr_count_below branch ignores contributor_ctx presence but not merged_pr_count absence symmetry.

When only pr_count_below is set and contributor_context is present but lacks merged_pr_count (e.g., legacy/custom enrichers), contributor_ctx.get("merged_pr_count") returns None and the branch correctly fail-opens. Behavior is fine; just suggesting to keep this invariant documented so future predicates (pr_count_above, etc.) follow the same convention: predicate present + data unknown ⇒ apply rule.

Also, reason field in the skip string would benefit from naming the subject (login) for downstream log clarity:

-                return False, f"contributor has {merged_count} merged PRs (threshold: {when.pr_count_below})"
+                login = contributor_ctx.get("login", "contributor")
+                return False, f"{login} has {merged_count} merged PRs (threshold: {when.pr_count_below})"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/when_evaluator.py` around lines 55 - 63, Ensure predicates like
pr_count_below follow the invariant "predicate present + data unknown ⇒ apply
rule" consistently across other predicates (e.g., pr_count_above) by keeping the
same fail-open behavior when contributor_ctx exists but merged_pr_count is None;
also change the skip/reason string returned by the pr_count_below check in
when_evaluator.py (the branch that currently returns False, f"contributor has
{merged_count} merged PRs (threshold: {when.pr_count_below})") to include the
contributor identifier, e.g. use contributor_ctx.get("login") in the message so
it reads something like "contributor {login} has {merged_count} merged PRs
(threshold: ...)" to improve downstream log clarity.
src/integrations/github/api.py (1)

1108-1142: Consider Search API rate-limit awareness and structured logging.

Two small observations on search_merged_pr_count:

  1. The GitHub Search API has a much stricter secondary rate limit (~30 req/min even when authenticated) than the core API. Since this is invoked on every PR event, a busy repo could hit secondary limits and silently return None for all contributors, which flips every newcomer rule to fail-open. Consider caching per (repo, username) for the life of an event (or short TTL) and/or distinguishing 403/429 from other errors so they can be surfaced/alerted.
  2. The warning log uses ad-hoc fields; per the structured-logging guideline, prefer operation, subject_ids, decision, latency_ms at external-call boundaries for consistency with the rest of this module (e.g., get_repository_tree).
♻️ Suggested logging alignment
-                logger.warning(
-                    "search_merged_pr_count failed",
-                    repo=repo,
-                    username=username,
-                    status=response.status,
-                    response=error_text[:200],
-                )
+                logger.warning(
+                    "search_merged_pr_count",
+                    operation="search_merged_pr_count",
+                    subject_ids={"repo": repo, "username": username},
+                    decision=f"http_error_{response.status}",
+                    response=error_text[:200],
+                )

As per coding guidelines: "Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/integrations/github/api.py` around lines 1108 - 1142, The
search_merged_pr_count function should handle Search API secondary rate limits
and use structured logging: detect 403/429 responses from session.get (in
search_merged_pr_count) and return None but log them distinctly (include
operation="search_merged_pr_count", subject_ids={"repo": repo, "username":
username}, decision="rate_limited" or other decision, and latency_ms), while
other errors use decision="error" or "no_data"; also implement a short-lived
cache keyed by (repo, username) (or event-scoped cache) to avoid calling
get_installation_access_token/_get_session repeatedly for the same pair during
an event and to reduce hitting the ~30 req/min secondary limit. Ensure you still
return int when status 200, preserve existing None behavior for failures, and
add structured logger calls referencing logger used in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rules/when_evaluator.py`:
- Around line 65-70: Replace fnmatch-based matching with gitignore-style
matching using pathspec: keep the existing conversion of when.files_match into
patterns and the filenames list from changed_files, then import pathspec and
build a PathSpec from the patterns via PathSpec.from_lines("gitwildmatch",
patterns) and use spec.match_file(name) (or spec.match_files) to test whether
any filename matches; return False with the same message if none match. Update
the matching expression that currently uses fnmatch.fnmatch(name, pat) to use
the pathspec spec, and ensure imports and error message (patterns variable)
remain correct.

---

Nitpick comments:
In `@src/integrations/github/api.py`:
- Around line 1108-1142: The search_merged_pr_count function should handle
Search API secondary rate limits and use structured logging: detect 403/429
responses from session.get (in search_merged_pr_count) and return None but log
them distinctly (include operation="search_merged_pr_count",
subject_ids={"repo": repo, "username": username}, decision="rate_limited" or
other decision, and latency_ms), while other errors use decision="error" or
"no_data"; also implement a short-lived cache keyed by (repo, username) (or
event-scoped cache) to avoid calling get_installation_access_token/_get_session
repeatedly for the same pair during an event and to reduce hitting the ~30
req/min secondary limit. Ensure you still return int when status 200, preserve
existing None behavior for failures, and add structured logger calls referencing
logger used in this module.

In `@src/rules/models.py`:
- Around line 33-52: The RuleWhen Pydantic model currently allows unknown fields
to be ignored; change its configuration to forbid extra fields so typos or
unsupported predicates raise errors at construction. In the RuleWhen class add
the Pydantic v2 model config (e.g. model_config = {"extra": "forbid"}) so
RuleWhen(...) will raise on unexpected keys, referencing the RuleWhen class and
its existing fields (contributor, pr_count_below, files_match) when making the
change.

In `@src/rules/when_evaluator.py`:
- Around line 55-63: Ensure predicates like pr_count_below follow the invariant
"predicate present + data unknown ⇒ apply rule" consistently across other
predicates (e.g., pr_count_above) by keeping the same fail-open behavior when
contributor_ctx exists but merged_pr_count is None; also change the skip/reason
string returned by the pr_count_below check in when_evaluator.py (the branch
that currently returns False, f"contributor has {merged_count} merged PRs
(threshold: {when.pr_count_below})") to include the contributor identifier, e.g.
use contributor_ctx.get("login") in the message so it reads something like
"contributor {login} has {merged_count} merged PRs (threshold: ...)" to improve
downstream log clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b66420f-4781-41eb-becb-c610d1d47f1a

📥 Commits

Reviewing files that changed from the base of the PR and between 60ae336 and d58ae8f.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • src/agents/engine_agent/agent.py
  • src/agents/engine_agent/models.py
  • src/agents/engine_agent/nodes.py
  • src/event_processors/pull_request/enricher.py
  • src/integrations/github/api.py
  • src/rules/loaders/github_loader.py
  • src/rules/models.py
  • src/rules/when_evaluator.py
  • tests/unit/agents/test_engine_agent.py
  • tests/unit/event_processors/pull_request/test_enricher.py
  • tests/unit/integrations/github/test_api.py
  • tests/unit/rules/test_loader_when_block.py
  • tests/unit/rules/test_when_evaluator.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/agents/engine_agent/agent.py
  • src/agents/engine_agent/nodes.py
  • src/rules/models.py
  • src/rules/loaders/github_loader.py
  • src/agents/engine_agent/models.py
  • tests/unit/integrations/github/test_api.py
  • tests/unit/agents/test_engine_agent.py
  • tests/unit/event_processors/pull_request/test_enricher.py
  • src/integrations/github/api.py
  • src/event_processors/pull_request/enricher.py
  • tests/unit/rules/test_loader_when_block.py
  • src/rules/when_evaluator.py
  • tests/unit/rules/test_when_evaluator.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

  • tests/unit/integrations/github/test_api.py
  • tests/unit/agents/test_engine_agent.py
  • tests/unit/event_processors/pull_request/test_enricher.py
  • tests/unit/rules/test_loader_when_block.py
  • tests/unit/rules/test_when_evaluator.py
🧠 Learnings (2)
📚 Learning: 2026-03-27T12:52:44.067Z
Learnt from: oleksii-quinta
Repo: warestack/watchflow PR: 67
File: src/webhooks/handlers/issue_comment.py:153-159
Timestamp: 2026-03-27T12:52:44.067Z
Learning: When enqueuing processor tasks using `task_queue`, follow the documented pre-built-task pattern: (1) build the task with `pre_built_task = task_queue.build_task(event_type, payload, processor.process, delivery_id=...)`; (2) call `task_queue.enqueue(processor.process, event_type, payload, pre_built_task, delivery_id=...)` by passing the pre-built task as a single positional `*args` element; (3) ensure the worker ultimately calls `await processor.process(pre_built_task)` (i.e., the processor `process(self, task: Task)` receives the `Task` instance). This matches the expectation that `enqueue` stores the pre-built task in the wrapper Task’s `args` as described by `build_task`’s docstring (“pass as single arg to enqueue”).

Applied to files:

  • src/agents/engine_agent/agent.py
  • src/agents/engine_agent/nodes.py
  • src/rules/models.py
  • src/rules/loaders/github_loader.py
  • src/agents/engine_agent/models.py
  • src/integrations/github/api.py
  • src/event_processors/pull_request/enricher.py
  • src/rules/when_evaluator.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

  • tests/unit/agents/test_engine_agent.py
  • tests/unit/rules/test_loader_when_block.py
  • tests/unit/rules/test_when_evaluator.py
🔇 Additional comments (12)
src/agents/engine_agent/agent.py (1)

206-232: LGTM — when propagation is consistent across Rule objects and legacy dicts.

Explicitly defaulting when = None for the dict branch keeps RuleDescription.when uniformly present regardless of input shape.

src/agents/engine_agent/nodes.py (1)

40-56: LGTM — correct short-circuit ordering and skip logging.

Event-type filter runs first (cheapest), then should_apply_rule is invoked with rule_desc.when (safe for None, returns (True, "")). Skip reason is both logged at debug and recorded in analysis_steps for observability.

CHANGELOG.md (1)

9-23: LGTM.

Entry accurately describes the supported predicates, AND semantics, skip/log behavior, and defers expression parsing/extended predicates to follow-up PRs.

src/rules/models.py (1)

85-88: LGTM.

New optional when field is properly typed and defaulted, preserving backward compatibility with existing rules.

src/rules/loaders/github_loader.py (1)

116-131: LGTM — defensive parsing with clear, contextual warnings.

Both the non-mapping and validation-error paths log the offending rule's description and leave when_block = None, so a malformed when: block degrades gracefully to "rule always runs" rather than failing the whole load. If you adopt extra="forbid" on RuleWhen (see suggestion on src/rules/models.py), this try/except will also catch typo'd predicate keys.

tests/unit/integrations/github/test_api.py (1)

324-398: LGTM — solid coverage of search_merged_pr_count branches.

Happy-path total extraction, zero result, 403 rate-limit, 5xx, missing installation token (with get.assert_not_called()), and generic exception all covered. The URL-encoding assertions (is%3Apr, repo%3Aowner/repo, author%3Aalice) pin the query shape that when_evaluator relies on.

src/agents/engine_agent/models.py (1)

14-14: LGTM.

RuleWhen import and new when field on RuleDescription mirror Rule.when and cleanly plumb the predicate through to nodes.should_apply_rule.

Also applies to: 114-116

tests/unit/agents/test_engine_agent.py (1)

158-226: LGTM — tests assert both behavioral skip and observability.

test_engine_skips_rule_when_when_block_does_not_match proves conditions on a gated rule are not evaluated while an unconditional rule still runs, and test_engine_logs_rule_skip_with_description_and_reason pins the exact skip-log format. Consider also adding a positive-gating test (e.g. is_first_time: True so the first-time rule does fire) to guard against an inverted predicate regression, but not required.

tests/unit/event_processors/pull_request/test_enricher.py (1)

53-142: Good coverage of enrichment fail-open paths.

Tests correctly exercise the success, zero-count (first-time), None return, exception, and legacy-client-without-method branches of _build_contributor_context, matching the fail-open contract expected by when_evaluator.

src/event_processors/pull_request/enricher.py (1)

99-145: Contributor context enrichment looks correct.

hasattr guard for legacy clients, try/except narrowed around the external call, and the derived booleans correctly treat merged_count=None as both is_first_time=False and trusted=False (via short-circuit in bool(merged_count and ...)), which lines up with the evaluator's fail-open semantics.

One minor note: if author_login is absent (bot-authored PRs with no user.login, or deleted users), contributor_context will be missing from event_data entirely. The evaluator already fail-opens for missing context, so this is fine — just worth flagging that bot PRs and rules gated on contributor: first_time will apply rather than skip. Confirm that matches intent.

tests/unit/rules/test_loader_when_block.py (1)

1-120: Thorough loader coverage.

Cases for absent/null/non-mapping/invalid-type when: plus valid predicate shapes (string, list, combined) line up with _parse_rule's behavior of setting when=None on any parse failure and logging a warning.

tests/unit/rules/test_when_evaluator.py (1)

1-167: Comprehensive predicate coverage.

Good mix of positive/negative matches, boundary at pr_count_below=3 with merged=3, fail-open for missing/None context, combined-predicate AND semantics, and non-dict changed_files robustness. Matches the evaluator contract.

Comment thread src/rules/when_evaluator.py
@watchflow
Copy link
Copy Markdown

watchflow Bot commented Apr 19, 2026

🛡️ Watchflow Governance Checks

Status: ❌ 1 Violations Found

🟡 Medium Severity (1)

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Pull request exceeds maximum lines changed (751 > 500)
How to fix: Reduce the size of this PR to at most 500 lines changed (additions + deletions).


💡 Reply with @watchflow ack [reason] to override these rules, or @watchflow help for commands.

Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily.

@watchflow
Copy link
Copy Markdown

watchflow Bot commented Apr 20, 2026

🛡️ Watchflow Governance Checks

Status: ❌ 1 Violations Found

🟡 Medium Severity (1)

Validates that total lines changed (additions + deletions) in a PR do not exceed a maximum; enforces a maximum LOC per pull request.

Pull request exceeds maximum lines changed (753 > 500)
How to fix: Reduce the size of this PR to at most 500 lines changed (additions + deletions).


💡 Reply with @watchflow ack [reason] to override these rules, or @watchflow help for commands.

Thanks for using Watchflow! It's completely free for OSS and private repositories. You can also self-host it easily.

@codesensei-tushar
Copy link
Copy Markdown
Contributor Author

codesensei-tushar commented Apr 20, 2026

when: block — end-to-end sandbox validation

Validated conditional rule applicability on this branch against a live GitHub App installation. All three
predicate families (contributor, pr_count_below, files_match) behave as designed, including the AND
composition of multiple predicates.

Test repo: codesensei-tushar/watchflow-sandbox

Setup

A 7-rule .watchflow/rules.yaml covering every predicate permutation (first-time, trusted, pr_count_below,
single glob, list of globs, combined AND). Two contributors differing only in their per-repo merged-PR count:

  • tushar-u (burner, PR from fork): merged_pr_count = 0is_first_time = true
  • codesensei-tushar (repo owner): flipped 0 → 1 mid-test via a clean merge (PR
    #5
    ) to become trusted, then further bumped 1 → 3 via PR #8 and PR
    #9
    to cross the pr_count_below: 3 threshold

Test matrix

Case PR Author state Result
Green baseline, small first-time edit #4
first-time pass, 0 violations — 3 of 7 rules applicable, rest correctly skipped
Clean trusted-flip PR (merged) #5
first-time → trusted pass, 0 violations, merged
"Break everything" from owner #6 trusted
fail, 4 violations — first-time rules skipped, trusted CHANGELOG rule fired
"Break everything" from burner #7
first-time fail, 5 violations — first-time rules fired, trusted rule skipped
Large clean PR after merged_pr_count = 3
#10 trusted, 3 merged pass, 0 violations —
max_pr_loc rule skipped by when: pr_count_below: 3 despite 600 LOC

Side-by-side proof

PRs #6 and
#7 have identical change surfaces (600-line
src/big.py, non-conventional title, 3-char body, .github/workflows/* + docs/* additions). The only
difference is the author's per-repo merged_pr_count. Violations diverge precisely where when: contributor
gates it. PR #10 adds a third column showing
the pr_count_below: 3 gate closing once the author has 3 merged PRs.

Rule PR #7 (first-time, 0 merged) PR #6 (trusted, 1 merged) PR #10 (trusted, 3 merged)
require_linked_issue (no when:) fired fired applied, passed
First-time title + desc (when: contributor: first_time) fired skipped skipped
Trusted CHANGELOG (when: contributor: trusted) skipped fired applied, passed
max_pr_loc (when: pr_count_below: 3) fired fired skipped
docs/* title (when: files_match: docs/*) fired fired skipped (no docs/ changes)
src/** codeowner (when: files_match: [...]) applied, passed applied, passed applied, passed
Combined first_time + src/** applied, passed skipped skipped

The diagonal inversion (rows 2 and 3) is the direct observable signal that contributor: gating works
end-to-end through the enricher → when_evaluator → condition pipeline. Row 4 proves the same for
pr_count_below: — server logs for PR #10 show exactly 3 rules marked applicable, with max_pr_loc absent from
the applicability list before condition evaluation ever ran.

Verdict

when: block on feat/first-time-contributor-rules is functionally correct. All three predicate types and
their AND composition behave exactly as documented in src/rules/when_evaluator.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants