Open
Conversation
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Author
LabelsThe following labels could not be found: Please fix the above issues or remove invalid values from |
GareBear99
added a commit
that referenced
this pull request
Apr 22, 2026
…ique corpus Mirror of the evidence the operator captured on its first non-toy run. This doc is the audit trail that pairs each row in data/critique/operator_reviews.jsonl with its Portfolio issue, its verdict, and its ingest manifest, so Gate v2 training can cite specific live-deployment events as the cause of any critique-slice improvement (not just 'the model got better'). docs/OPERATOR_EVIDENCE.md - Header with links back to LIVE_DEPLOYMENT_LEARNING.md and the operator- side FIRST_LIVE_RUN.md for the same run. - Entry #1: FreeEQ8 (Portfolio issue #1, 2026-04-22). * Target, depth, focus. * Verdict: yellow / address feedback. * Observation: 51 files, 1.79 MB, full root file + dir listing, top extensions. * Finding verbatim: sparse symbols in three JUCE headers. * Exact JSONL shape that lands in data/critique/operator_reviews.jsonl. * Ingest manifest from scripts/ingest_operator_reviews.py --strict (exit 0, 1 record accepted). * Phase mapping (Phase 0 PROVED on FreeEQ8; Phases 2-5 blocked only by billing hold + three PATs, not by code). - 'How to read this doc going forward' explains that new entries are appended chronologically and form the evidence chain. README.md - Adds 'Operator evidence log' to the Table of contents between the Live-deployment anchor and 'What this is'. - Adds a 'Live-run evidence' line at the end of the Live-deployment section pointing at OPERATOR_EVIDENCE.md. Co-Authored-By: Oz <oz-agent@warp.dev>
GareBear99
added a commit
that referenced
this pull request
Apr 22, 2026
Defines the six acceptance stages LLMBuilder uses to validate that live-
deployment data from gh-ai-operator actually improves the critique
capability. Professional engineering-validation framing throughout.
Stages:
1. Contract correctness (schema) -- PASSED, automated via round-trip CI.
2. Sample representativeness -- diversity requirements across first 50
records (difficulty range, distinct verdicts, distinct target_urls).
3. Provenance auditability -- every record traceable to a public event
(PASSED for entry #1: FreeEQ8, Portfolio issue #1).
4. Self-consistency -- two runs against the same target at the same
commit must produce identical verdict and Jaccard >= 0.7 on findings.
5. A/B proof of learning on the critique slice -- two identical
candidates, only training data differs. Pass iff delta_critique > 0
AND max_regression <= 0.5 pp on any other slice.
6. Blind evaluation against a human reviewer -- 10 held-out repos, 20
randomized critiques, 3 axes (specificity, usefulness, invention
absence). Pass iff enriched candidate wins >=6/10 on at least two
of three axes.
Each stage has explicit pass criteria and explicit failure modes. No stage
passes by assertion. Results get appended to docs/OPERATOR_EVIDENCE.md
with the date, the Portfolio issues involved, and the commands/run IDs
that produced the measurement.
Stages 1 and 3 pass today. Stages 2 and 4-6 require the live-deployment
secrets set and sufficient ingested records to accumulate.
Co-Authored-By: Oz <oz-agent@warp.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps actions/setup-python from 5 to 6.
Release notes
Sourced from actions/setup-python's releases.
... (truncated)
Commits
a309ff8Bump urllib3 from 2.6.0 to 2.6.3 in /tests/data (#1264)bfe8cc5Upgrade@actionsdependencies to Node 24 compatible versions (#1259)4f41a90Bump urllib3 from 2.5.0 to 2.6.0 in /tests/data (#1253)83679a8Bump@types/nodefrom 24.1.0 to 24.9.1 and update macos-13 to macos-15-intel ...bfc4944Bump prettier from 3.5.3 to 3.6.2 (#1234)97aeb3eBump requests from 2.32.2 to 2.32.4 in /tests/data (#1130)443da59Bump actions/publish-action from 0.3.0 to 0.4.0 & Documentation update for pi...cfd55cagraalpy: add graalpy early-access and windows builds (#880)bba65e5Bump typescript from 5.4.2 to 5.9.3 and update docs/advanced-usage.md (#1094)18566f8Improve wording and "fix example" (remove 3.13) on testing against pre-releas...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)