Skip to content

ci: split PR workflow trigger-only jobs#3710

Merged
joeyorlando merged 20 commits intomainfrom
codex/fix-pr-ci-skip-noise
Apr 10, 2026
Merged

ci: split PR workflow trigger-only jobs#3710
joeyorlando merged 20 commits intomainfrom
codex/fix-pr-ci-skip-noise

Conversation

@joeyorlando
Copy link
Copy Markdown
Contributor

@joeyorlando joeyorlando commented Apr 10, 2026

Summary

  • keep normal first-class PR checks together in On Pull Requests, and run that workflow on merge queue so required checks are produced for merge-group SHAs
  • add Docker Image Builds to PR pushes and merge queue so both platform and MCP server base images must build successfully
  • preserve release-please auto-commit behavior for generated docs/ and platform/ changes such as OpenAPI/codegen updates
  • move PR title linting into its own workflow so title edits do not create skipped main CI checks, while posting a no-op merge-queue status so it can remain required
  • inline full E2E into Platform E2E Tests, triggered by the run-e2e label, manual dispatch, and automatically on merge queue
  • split Docker Scout scanning into Docker Image Scanning, triggered automatically on merge queue or on demand with the run-docker-scan label
  • delete the old Platform Linting and Tests (Required) reusable workflow

Context

The example run for PR #3704 was triggered by a PR title edit. Because title edits and labels shared the main PR workflow, GitHub created skipped check runs for unrelated jobs. The previous required reusable workflow also created fast-pass placeholder checks for Helm chart linting, Docker image scanning, and E2E jobs on normal PRs.

After this change, ordinary PR runs include real checks in On Pull Requests: release freeze, license compliance, supply-chain policy, dependency review, platform lint/unit plus codegen/drizzle/migration validation, Docker image builds, and Helm chart linting/tests. The same workflow also listens to merge_group because merge queue requires required-check statuses on the temporary merge-group SHA. Dependency Review is PR-context-only, so that check passes explicitly on merge queue after having validated the PR before queueing.

PR title linting remains separate because it must run on title edits. It also listens to merge_group with a no-op pass step so PR Title Linter can remain a required check when merge queue is enabled. Full E2E is its own first-class workflow that runs by label, by manual dispatch, or automatically through merge queue. Docker Scout scanning is also first-class but not paid on every PR push: it runs on merge queue or when a maintainer applies run-docker-scan; fork PR label events are filtered at the job level so secrets-backed scan work is not scheduled.

Release-please PRs keep the old generated-file behavior: if pnpm codegen && pnpm lint:fix changes checked-in docs/ or platform/ files, CI generates a GitHub App token, waits for drizzle-kit check, commits the generated updates back to the release-please branch, and exits successfully so the follow-up push re-runs CI.

Follow-up hardening

  • restored the pnpm db:generate timeout/prompt detection so interactive Drizzle rename prompts fail fast instead of hanging CI
  • scoped generated-file diff checks to docs/ and platform/
  • made E2E test jobs depend on a successful image build instead of trying to run after build failure
  • retained full Playwright report artifacts for 30 days if any E2E shard, quickstart, or Vault K8s job fails
  • kept the docs/openapi.json version bump in this branch intentionally

Validation

  • ruby YAML parse for changed workflow files
  • pnpm dlx github-actionlint -ignore custom Blacksmith runner labels for changed workflow files
  • git diff --check

Docs

  • audited docs/pages; no docs changes needed for CI workflow internals

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Claude finished @joeyorlando's task in 3m 19s —— View job


PR Review

Overall this is a well-structured refactor. The approach of splitting trigger-only jobs into their own workflows is correct, and the merge_group handling is consistent across files. A few findings below, mostly minor.

Findings

1. Hardcoded auto-commit message may misrepresent the diff.github/workflows/on-pull-requests.yml:260

The commit message is hardcoded "chore: update docs/openapi.json version for release", but the add covers the whole of docs/ and platform/. If a release-please PR also includes drizzle migration codegen or other generated files under platform/, the message misleads. Consider either a more generic message (e.g. "chore: apply codegen updates") or listing affected paths in the message body.

2. Path-scoped codegen diff is a behavior tighteningon-pull-requests.yml:251,266

The old workflow used git diff --exit-code (all paths). The new one limits to docs/ and platform/. If codegen ever writes to shared/, the .github/ scripts dir, or a new workspace, those changes will be silently accepted. The PR description calls this out as intentional, but it's worth either (a) widening the check back to all tracked files or (b) adding an explicit git diff --exit-code -- shared/ clause so the scope is deliberate rather than incidental.

3. Loss of explicit drizzle-kit failure messageon-pull-requests.yml:253-254,272-273

The old workflow printed ❌ drizzle-kit check failed on wait failure. In the new script set -e is active, so a failed wait $DRIZZLE_PID will exit silently with no contextual error in the logs. Consider wrapping the waits:

if ! wait "$DRIZZLE_PID"; then
  echo "::error::drizzle-kit check failed. Run 'pnpm drizzle-kit check' locally."
  exit 1
fi

Fix this →

4. Missing workflow_dispatch on the new Docker scan workflow.github/workflows/docker-image-scanning.yml:6-11

platform-e2e-tests.yml exposes workflow_dispatch so maintainers can re-run E2E manually from the Actions UI. The new docker-image-scanning.yml only supports pull_request: labeled and merge_group, so re-scanning after a Dockerfile fix requires remove-and-re-add on the label. Adding workflow_dispatch: is cheap and keeps the two workflows consistent.
Fix this →

5. PR title linter triggers on edited but not synchronizepr-title-linter.yml:4-8

Correct choice (title doesn't change on code pushes), but worth verifying that required-check resolution doesn't get stuck when a PR is opened with a bad title, then fixed via edited: GitHub does re-post the status on the latest edited event, so this is fine — just flagging for awareness.

6. Docker Scout rebuilds images independently of docker-image-buildsdocker-image-scanning.yml:47-78 vs on-pull-requests.yml:305-363

On merge queue, both Docker Image Builds and Docker Image Scanning build the same two images from scratch on separate runners. That's ~double the merge queue build time. Reusing a registry-pushed image (as the E2E workflow does for the platform image) would be faster but requires secrets wiring. Acceptable trade-off given Scout needs a local daemon image, but worth revisiting if merge queue latency becomes an issue.

7. docker-image-scanning job-level if duplicates workflow-level gatingdocker-image-scanning.yml:21

The workflow-level on: is already pull_request: labeled + merge_group, and the job if re-checks label.name == 'run-docker-scan' plus the fork guard. The fork guard is load-bearing (forks can't access Docker Hub secrets), but the label name check could arguably move to the workflow level via a simple if on the single job. Not a bug — current shape is clear.

8. Release-please auto-commit path still validates nothing after exit 0on-pull-requests.yml:263

By design, migrations/drizzle validation are skipped on auto-commit runs and re-validated on the follow-up push. Worth a one-line comment above exit 0 clarifying "Workflow re-triggers on the pushed commit and re-runs all validation" so future readers don't assume this is a gap.

9. Concurrency on merge_group correctly scopedon-pull-requests.yml:17, docker-image-scanning.yml:15, platform-e2e-tests.yml:16, pr-title-linter.yml:16

Group key falls back to merge_group.head_sha, so different merge queue entries don't cancel each other. Good.

10. Dependency Review name sharing across PR and merge_group jobson-pull-requests.yml:143,166

Both jobs use name: Dependency Review, which is the right trick for letting one required check satisfy both contexts. Intentional and correct.

11. docs/openapi.json version bump — benign codegen artifact, consistent with the PR's stated intent.

Blocking issues

None — all findings above are comments/suggestions.

Summary

Approving intent. The only items I'd recommend addressing before merge are (1) the misleading hardcoded auto-commit message and (3) the silent wait failures. (2) is a deliberate tightening per the PR description and can stay if documented, and (4) is a small DX improvement.


Review Tasks

  • Read repository review guidelines
  • Fetch PR diff against origin/main
  • Review .github/workflows/on-pull-requests.yml
  • Review .github/workflows/docker-image-scanning.yml
  • Review .github/workflows/platform-e2e-tests.yml
  • Review .github/workflows/pr-title-linter.yml
  • Review docs/openapi.json change
  • Verify no lingering references to removed platform-linting-and-tests.yml
  • Post review findings
    • Branch: codex/fix-pr-ci-skip-noise

@joeyorlando joeyorlando enabled auto-merge April 10, 2026 18:28
@joeyorlando joeyorlando disabled auto-merge April 10, 2026 18:32
@joeyorlando joeyorlando added this pull request to the merge queue Apr 10, 2026
Merged via the queue into main with commit 774d2bf Apr 10, 2026
21 checks passed
@joeyorlando joeyorlando deleted the codex/fix-pr-ci-skip-noise branch April 10, 2026 19:46
joeyorlando added a commit that referenced this pull request Apr 11, 2026
## Summary
- restore the GAR auth secret names used by `platform-e2e-tests.yml` on
trusted runs
- fix merge queue E2E image builds failing before any tests start

## Context
PR #3710 inlined the old reusable platform workflow into first-class
workflows. During that refactor, the E2E image-build workflow switched
from the repo's working secret names:

- `DEVELOPMENT_OAUTH_PROXY_RELEASER_GCP_SERVICE_ACCOUNT_NAME`
-
`DEVELOPMENT_OAUTH_PROXY_RELEASER_GCP_WORKLOAD_IDENTITY_PROVIDER_IDENTIFIER`

to generic `GCP_SERVICE_ACCOUNT_NAME` /
`GCP_WORKLOAD_IDENTITY_PROVIDER_IDENTIFIER`.

Those generic secrets are not populated for this workflow in merge
queue, so `Authenticate to Google Artifact Registry` fails with empty
inputs and the merge queue never reaches the E2E jobs.

This change restores the original secret names in both GAR auth steps
inside `.github/workflows/platform-e2e-tests.yml`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant