Skip to content

ci: Auto-correct mismatched PR numbers in news fragments#11215

Open
rapsealk wants to merge 4 commits intomainfrom
fix/validate-news-fragment-pr-number
Open

ci: Auto-correct mismatched PR numbers in news fragments#11215
rapsealk wants to merge 4 commits intomainfrom
fix/validate-news-fragment-pr-number

Conversation

@rapsealk
Copy link
Copy Markdown
Member

@rapsealk rapsealk commented Apr 22, 2026

Summary

  • Extend scripts/assign-pr-number.py so the assign-pr-number workflow also rewrites news fragments whose filename prefix does not match the current PR number (e.g., 9999.fix.md written with an issue number instead of a PR number), not only unnumbered placeholders like .fix.md.
  • Scope the rewrite to fragments added or modified by this PR (computed via git diff --diff-filter=AM origin/$GITHUB_BASE_REF...HEAD), so historical fragments accumulated on the base branch from past merged PRs are never touched.
  • Bump the workflow's fetch-depth from 2 to 0 so the base ref is reachable for the diff.
  • Document the filename convention in changes/README.md: the leading number must be the pull request number, and unnumbered drafts are acceptable because CI will fill it in on push.

Resolves #11214. Builds on the numbering automation originally introduced in #989.

Behavior

For files in changes/, running python scripts/assign-pr-number.py <PR> under CI (with GITHUB_BASE_REF set):

File Before After
.fix.md (added by this PR, unnumbered) rename → <PR>.fix.md rename → <PR>.fix.md (unchanged)
<PR>.fix.md (added by this PR, correct) no-op no-op (unchanged)
9999.fix.md (added by this PR, wrong number) no-op (silently merged wrong) rename → <PR>.fix.md (new)
11170.fix.md (from a past merged PR, on base branch) no-op no-op (explicitly preserved by the diff-based scope)

Local runs without GITHUB_BASE_REF fall back to the pre-existing behavior: rename unnumbered placeholders, leave all numbered fragments alone.

Test plan

  • Simulated historical fragment (11170.fix.md) present on base + this-PR's .doc.md, 12345.feature.md, 9999.fix.md — only the unnumbered and the mismatched fragments are renamed; the historical one is left alone.
  • Simulated the local-run fallback (no GITHUB_BASE_REF) — mismatched fragment is preserved; only unnumbered is renamed.
  • Simulated the no-op path (matching fragment only) — prints already exists, emits has_renamed_pairs=false.
  • pants --changed-since=HEAD --changed-dependents=transitive fmt lint check passes.
  • CI's assign-pr-number job ran on this PR's first push and correctly renamed changes/.misc.mdchanges/11215.misc.md.

Notes for reviewers

One open question: backport PRs. A backport to a release branch typically lands with the fragment already numbered after the original main PR (e.g., copying 11170.fix.md into the backport branch). Because backport PRs do touch changes/11170.fix.md in their diff, this change would rewrite that prefix to the backport PR's number. If the project convention is to preserve the original PR number in release-branch fragments, the rewrite should be gated further (e.g., only when GITHUB_BASE_REF == "main"). Current behavior on main is unaffected.

Copilot AI review requested due to automatic review settings April 22, 2026 04:07
@github-actions github-actions Bot added the size:S 10~30 LoC label Apr 22, 2026
@github-actions github-actions Bot added size:M 30~100 LoC and removed size:S 10~30 LoC labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing CI/news-fragment automation so that scripts/assign-pr-number.py not only fills in missing PR numbers for placeholders (e.g., .fix.md), but also auto-corrects mismatched numbered fragments (e.g., 9999.fix.md) by renaming them to the current PR number. It also documents the expected changes/ filename convention.

Changes:

  • Update scripts/assign-pr-number.py to detect numbered fragments whose prefix doesn’t match the current PR number and rename them to <PR>.<type>.md.
  • Document the changes/ filename rule (PR number prefix) and CI behavior in changes/README.md.
  • Add a .misc.md news fragment describing the CI behavior change.

Reviewed changes

Copilot reviewed 15 out of 19 changed files in this pull request and generated 2 comments.

File Description
scripts/assign-pr-number.py Adds “mismatched PR number” detection and renaming to enforce fragment naming consistency.
changes/README.md Documents the fragment filename convention and CI auto-renaming behavior.
changes/.misc.md News fragment entry describing the new auto-correction behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/assign-pr-number.py Outdated
Comment on lines +56 to +60
for original_filename, news_type in mismatched_fragments:
numbered_filename = f"{pr_number}.{news_type}.md"
file_path = base_path / original_filename
file_path.rename(base_path / numbered_filename)
print(
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the unnumbered and mismatched rename paths call Path.rename() without checking whether the destination filename already exists (e.g., changes/<PR>.fix.md already present). On POSIX this can overwrite the existing fragment (data loss); on some platforms it can raise and fail the workflow. Please guard against collisions (including two fragments of the same type in one PR) by checking for an existing destination and exiting non-zero with a clear message (or choosing a deterministic non-colliding filename).

Copilot uses AI. Check for mistakes.
Comment thread changes/README.md Outdated

### Filename convention

News fragments live in `changes/` and must be named `<PR-number>.<type>.md`, where `<type>` is one of the towncrier types declared under `[tool.towncrier]` in `pyproject.toml` (e.g., `fix`, `feature`, `breaking`, `misc`).
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README points to types being declared under [tool.towncrier] in pyproject.toml, but the actual convention used by the script is the [[tool.towncrier.type]] entries (and it uses each entry’s directory field). Consider updating this wording to match the real config structure so contributors know exactly where the allowed values come from.

Suggested change
News fragments live in `changes/` and must be named `<PR-number>.<type>.md`, where `<type>` is one of the towncrier types declared under `[tool.towncrier]` in `pyproject.toml` (e.g., `fix`, `feature`, `breaking`, `misc`).
News fragments live in `changes/` and must be named `<PR-number>.<type>.md`, where `<type>` matches one of the `directory` values defined by the `[[tool.towncrier.type]]` entries in `pyproject.toml` (e.g., `fix`, `feature`, `breaking`, `misc`).

Copilot uses AI. Check for mistakes.
When an author (or an AI coding agent) writes `changes/<N>.<type>.md`
with an issue number or any other wrong identifier in place of the
pull request number, the `assign-pr-number` workflow previously
treated it as already-numbered and left it untouched, so the wrong
number leaked into the release notes.

Extend `scripts/assign-pr-number.py` to rename a numbered news
fragment whose prefix does not match the current PR number, reusing
the same rename path as unnumbered fragments. Scope the rewrite to
fragments added or modified by this PR (computed from
`git diff --diff-filter=AM origin/$GITHUB_BASE_REF...HEAD`) so
historical fragments accumulated on the base branch from past merged
PRs are never touched. Fall back to the original behavior when
`GITHUB_BASE_REF` is absent (local runs): rename only unnumbered
fragments, leave every numbered fragment alone.

Bump `fetch-depth` in the workflow from 2 to 0 so the base ref is
reachable for the diff. Document the filename convention explicitly
in `changes/README.md`.

Related: #11214
@rapsealk rapsealk force-pushed the fix/validate-news-fragment-pr-number branch from 5ef9173 to 429a347 Compare April 22, 2026 04:20
@rapsealk rapsealk changed the title ci: auto-correct mismatched PR numbers in news fragments ci: Auto-correct mismatched PR numbers in news fragments Apr 22, 2026
.misc.md -> 11215.misc.md

Co-authored-by: octodog <mu001@lablup.com>
@rapsealk
Copy link
Copy Markdown
Member Author

Review

The scoping via git diff ... --diff-filter=AM successfully protects historical fragments (verified: git diff --name-status origin/main...HEAD on this branch shows zero deletions, only 4 A/M entries) and the new self-healing path closes a real authoring footgun. However, the mismatched-rename loop in scripts/assign-pr-number.py silently overwrites on collision, which is a production blocker I cannot wave through.

Strengths

  • Scoping is the right design — origin/$GITHUB_BASE_REF...HEAD (three-dot, against the merge-base) is the correct git incantation, and restricting the new rename-mismatch path to files inside pr_touched_fragments keeps the blast radius small.
  • Local-run fallback (base_ref = os.getenv(...) or ""pr_touched_fragments = None) preserves the prior behavior exactly: unnumbered files get renamed, mismatched loop is skipped. Good conservatism.
  • The two-loop split (existing unnumbered loop untouched, new mismatched loop gated on pr_touched_fragments is not None) means the original unnumbered path has no regression risk.
  • changes/README.md update is accurate and matches the new behavior.
  • CI loop termination is safe: after the auto-commit renames <x>.type.md<PR>.type.md, the re-triggered workflow sees it as correctly numbered and no-ops. No infinite commit loop.

Findings

Blocker — silent data loss on rename collision (scripts/assign-pr-number.py:91-100).
Path.rename() on POSIX silently overwrites an existing target. The new mismatched loop does not check for target existence:

for original_filename, news_type in mismatched_fragments:
    numbered_filename = f"{pr_number}.{news_type}.md"
    file_path = base_path / original_filename
    file_path.rename(base_path / numbered_filename)

Three concrete collision cases, all producing silent content loss:

  1. PR contains both 11215.fix.md (correct, lands in existing_fragments) and 9999.fix.md (mismatched). The mismatched loop renames 9999.fix.md11215.fix.md, overwriting the correct fragment.
  2. PR contains two mismatched fragments of the same type, e.g. 9999.fix.md and 8888.fix.md. Both want to become 11215.fix.md; the second rename silently clobbers the first.
  3. PR contains .fix.md (unnumbered) and 9999.fix.md (mismatched). The unnumbered loop creates 11215.fix.md; then the mismatched loop overwrites it.

I just verified the overwrite behavior locally (Path.rename on an existing target on Darwin/Linux drops the destination without error). Please either (a) use os.rename with an explicit exists() pre-check and sys.exit(1) with a clear message, or (b) fold both loops into one and coalesce by news_type before renaming, keeping the first and warning about duplicates. Given this script silently deleted 16 historical fragments in an earlier iteration of the PR, an explicit guard here is warranted.

Should-fix — follow-up PRs that modify a historical fragment get renumbered (scripts/assign-pr-number.py:54-65).
--diff-filter=AM includes M (modified). If a later PR legitimately edits changes/11170.fix.md (e.g., to fix a typo in a merged fragment before release), pr_touched_fragments will contain it, the prefix 11170 won't match that PR's number, and the script will rename it to <new-PR>.fix.md — destroying the original PR-number link. Two reasonable mitigations: (a) only rewrite when the diff status is A (added), not M; or (b) only rewrite when the file did not exist at the base commit (git cat-file -e origin/$BASE:changes/<file> → fail). Option (a) is simpler and matches the stated intent ("authors (or AI agents) writing a wrong number on a new fragment").

Should-fix — unguarded subprocess.run(..., check=True) in get_pr_touched_fragments (scripts/assign-pr-number.py:26-41).
If origin/$GITHUB_BASE_REF is not fetched (shallow clone regression, actions/checkout config change, someone running the script locally after git fetch --depth=1), check=True will raise CalledProcessError with a stderr dump and no actionable message for the user. Catch CalledProcessError, log a clear "base ref not fetched — did you set fetch-depth: 0?", and either fall back to None (local-style behavior) or sys.exit(1) deliberately.

Should-fix — fetch-depth: 0 on a ~1 GB-history repo is a meaningful cost (.github/workflows/assign-pr-number.yml:18).
This workflow runs on every PR push and only needs the merge base. A more targeted git fetch --no-tags --prune --depth=50 origin $GITHUB_BASE_REF (or actions/checkout with fetch-depth: 2 plus an explicit deepen step) would usually suffice, and --deepen can be used if the merge base is further back. Not a blocker, but worth a line of justification in the PR.

Nit — file[0 : file.find(".")] (scripts/assign-pr-number.py:71).
Pre-existing, but since this PR reuses it: prefer file.split(".", 1)[0] or, since you already have the regex match, add a named group (?P<number>\d+) to rx_numbered_fragment and use match.group("number"). More readable and doesn't silently return "" on a pathological input.

Nit — --diff-filter=AM misses R (scripts/assign-pr-number.py:33).
If a reviewer locally runs git mv 9999.fix.md 8888.fix.md (still wrong, but renamed), git reports it as R, not A/M, so it will not be in pr_touched_fragments and the mismatch healing will skip it. Adding R to the filter (--diff-filter=AMR) covers this. Edge case, but cheap.

Nit — type hint shadowing (scripts/assign-pr-number.py:54-56).
pr_touched_fragments: set[str] | None = (get_pr_touched_fragments(...) if base_ref else None) is fine, but you can simplify with pr_touched_fragments = get_pr_touched_fragments(...) if base_ref else None — the annotation isn't adding information here.

Nit — no unit tests (scripts/).
No tests exist under tests/ for scripts/assign-pr-number.py. For CI glue this is defensible, but the rename-collision scenarios above are exactly the kind of logic that pays off a small pytest file with a tmp changes/ directory. Consider adding coverage for: collision between mismatched + existing, collision between two mismatched same-type, unnumbered-wins-over-mismatched ordering, and local-fallback (no GITHUB_BASE_REF) leaving numbered fragments alone.

Open questions for the author

  • Backport branches (flagged in the PR body): current behavior will rewrite 11170.fix.md to the backport PR's number when the backport lands on a release branch. What's the team's convention — do release-branch fragments keep the original main PR number, or take the backport PR number? If the former, gate the mismatched-loop behind if base_ref == "main": (or a configurable list of trunk branches). Worth resolving before merge since the 25.* release branches will hit this on the first backport after this PR merges.
  • Why fetch-depth: 0 rather than a deepen-as-needed step? If there's a pragmatic reason (simplicity, avoids merge-base-not-found flakes), a one-line comment in the workflow would help future maintainers.
  • Is exiting non-zero on a mismatch (the "fail loudly" alternative from Validate news-fragment PR number matches current PR in assign-pr-number workflow #11214) still off the table? The auto-heal is friendly, but a hard failure has the property that humans notice the mistake rather than the script papering over it — and it eliminates the overwrite-collision risk entirely.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +36
"git",
"diff",
"--name-only",
"--diff-filter=AM",
f"origin/{base_ref}...HEAD",
"--",
str(base_path),
],
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git diff --diff-filter=AM excludes renamed fragments (status R). If a contributor renames a fragment in the PR (or a fragment is detected as renamed by git), it won't be included in pr_touched_fragments and the script will skip validating/renaming it in CI. Consider including renames (e.g., --diff-filter=AMR) or using --name-status to handle R entries.

Copilot uses AI. Check for mistakes.
Comment thread changes/README.md Outdated

News fragments live in `changes/` and must be named `<PR-number>.<type>.md`, where `<type>` is one of the towncrier types declared under `[tool.towncrier]` in `pyproject.toml` (e.g., `fix`, `feature`, `breaking`, `misc`).

The number in the filename must be the **pull request number** — not an issue number, Linear/Jira ticket, or any other identifier. When you don't yet know the PR number, leave it empty and commit a placeholder such as `.fix.md` or `fix.md`; the `assign-pr-number` CI workflow will fill in the PR number on push. The same workflow also rewrites any numbered fragment whose prefix does not match the current PR number, so mistakenly using an issue number is recoverable.
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description of the CI behavior is broader than what the script now does: mismatched numbered fragments are only rewritten when GITHUB_BASE_REF is set and only for fragments added/modified by the current PR (per the diff-based scoping). Consider updating this paragraph to reflect those constraints so readers don't assume all mismatched fragments are rewritten in all contexts (including local runs).

Suggested change
The number in the filename must be the **pull request number** — not an issue number, Linear/Jira ticket, or any other identifier. When you don't yet know the PR number, leave it empty and commit a placeholder such as `.fix.md` or `fix.md`; the `assign-pr-number` CI workflow will fill in the PR number on push. The same workflow also rewrites any numbered fragment whose prefix does not match the current PR number, so mistakenly using an issue number is recoverable.
The number in the filename must be the **pull request number** — not an issue number, Linear/Jira ticket, or any other identifier. When you don't yet know the PR number, leave it empty and commit a placeholder such as `.fix.md` or `fix.md`; the `assign-pr-number` CI workflow can fill in the PR number for fragments in PR-context runs. In the same context, it can also rewrite a numbered fragment whose prefix does not match the current PR number, but only for fragments added or modified by the current PR, so mistakenly using an issue number there is recoverable.

Copilot uses AI. Check for mistakes.
rapsealk and others added 2 commits April 22, 2026 13:34
… renumbering

Two follow-ups to the news-fragment auto-correction workflow based on
review:

1. ``Path.rename`` silently overwrites an existing destination on POSIX,
   so a PR carrying two fragments of the same type (e.g., ``.fix.md``
   plus ``9999.fix.md``, or two mismatched ``9999.fix.md`` and
   ``8888.fix.md``) would lose one fragment's contents without any
   error. Introduce ``rename_fragment_or_exit`` which aborts with a
   clear message when the target already exists, and route both the
   unnumbered-fragment loop and the mismatched-fragment loop through
   it.

2. ``git diff --diff-filter=AM`` accepts ``M``-status files, which
   means a PR that merely edits a historical fragment in place (for
   example a typo fix on a merged fragment prior to release) would
   have its original PR-number prefix rewritten to the editing PR's
   number. Narrow the filter to ``A`` so only newly added fragments
   become rename candidates; rename the helper to
   ``get_pr_added_fragments`` to match.
…ront

- Skip mismatch rewrites on release-branch backports so fragments keep
  linking to the original main PR in release notes (``GITHUB_BASE_REF == "main"``).
- Collect all planned renames before touching disk; reject collisions
  (with existing files or within the plan) before any partial rename.
- Align ``changes/README.md`` with the actual towncrier config
  (``[[tool.towncrier.type]]`` + ``directory``) and scope the auto-rewrite
  description to main PRs / PR-added fragments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size:L 100~500 LoC and removed size:M 30~100 LoC labels Apr 22, 2026
@rapsealk
Copy link
Copy Markdown
Member Author

LGTM ✅

Post-polish (beebc4cf2) the PR addresses all four open review concerns and the backport question raised in the PR body:

  • Backport gaterewrite_mismatched = base_ref == "main" (scripts/assign-pr-number.py:66) preserves the original-PR numbering on release-branch backports so release notes keep linking back to the original change.
  • Atomic collision detection — planned renames are collected up-front and checked against both on-disk files and intra-plan duplicates before any Path.rename() fires; no partial-rename state on failure.
  • Unified classification loop — numbered / unnumbered / invalid resolution happens in one pass, eliminating the prior dual-loop shape.
  • README accuracy — corrected the towncrier config reference to [[tool.towncrier.type]] + directory, and scoped the auto-rewrite description to PRs targeting main and fragments added by the PR.

Verified locally against 10 scenarios (main-PR mixed renames, backport preservation, local fallback, same-type collisions with disk and within plan, mismatch vs. correct collision, no-op paths, invalid filenames). CI is green, including the assign-pr-number workflow dogfooding itself on this very push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate news-fragment PR number matches current PR in assign-pr-number workflow

2 participants