ci(skills-eval): wire schedule to MANUAL_FULL_SWEEP + sync dispatch picker#701
Conversation
…icker Two independent fixes in one file: 1. Schedule trigger now exports MANUAL_FULL_SWEEP=1 (like workflow_dispatch already does). Before this, the schedule event fell through to the PR-mode code path in skills_eval_agent.py and died with "FATAL: PR_NUMBER not set in environment". Also default MANUAL_SKILLS_FILTER to "*" when INPUT_SKILLS is empty (schedule events don't populate `inputs`). 2. workflow_dispatch picker now lists the 3 skills that actually exist on disk (rag-blueprint, rag-eval, rag-perf) instead of 10 phantom names (rag-deploy-blueprint, rag-ingest-documents, etc.) that were left over from an old skill taxonomy. The LLM agent already enumerates skills/*/eval/*.json itself, so the stale picker was a UX trap only — anyone picking one of the old names got a silent BLOCKED result. ONBOARDING.md:487-492 documents this as a known gotcha; this PR closes the gotcha. Discovered while diagnosing GHA run 28214488174 (first nightly after PR #680). The Brev auth step and cleanup step from #680 both passed cleanly — this is a separate, pre-existing bug surfaced by them running on schedule for the first time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: richa <ricsingh@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesSkills-eval Workflow
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
Two independent fixes in
.github/workflows/skills-eval.yml:Schedule trigger now exports
MANUAL_FULL_SWEEP=1(same asworkflow_dispatchalready does). Before this, nightly runs crashed withFATAL: PR_NUMBER not set in environment— see GHA run 28214488174.workflow_dispatchpicker now lists the 3 skills that actually exist on disk (rag-blueprint,rag-eval,rag-perf) instead of 10 phantom names left over from an old taxonomy. The LLM already enumeratesskills/*/eval/*.jsonitself, so the stale picker was a UX trap — anyone picking an old name got a silentBLOCKED. Closes the gotcha documented atONBOARDING.md:487-492.Local verification (all green)
*+ 3 real skills).event_name=schedulenow setsMANUAL_FULL_SWEEP=1,MANUAL_SKILLS_FILTER=*;event_name=pushstill leaves them unset.FATAL: PR_NUMBERwithout the fix; with the fix the agent gets past the_requiregate.Why this is independent of #680
PR #680 fixed Brev auth + cleanup. Both passed cleanly on run 28214488174 (no GPU leak). This PR fixes the next failure exposed once cleanup stopped silently swallowing errors.
Test plan
workflow_dispatchon this branch — confirm picker only shows*,rag-blueprint,rag-eval,rag-perf, and the agent reaches[agent] startinginstead ofFATAL: PR_NUMBER.FATAL: PR_NUMBERin logs.brev lsfor any leakedrag-eval-gpu-*instances (still expected: zero).🤖 Generated with Claude Code
Summary by CodeRabbit