fix(globalStandards): Use median instead of mean for global standards normalization by tonywu1999 · Pull Request #198 · Vitek-Lab/MSstats

tonywu1999 · 2026-04-22T23:38:41Z

PR Type

Bug fix

Description

Switch standard normalization from means to medians
Derive run offsets from median standards
Keep fraction equalization median-based
Update test wording for median logic

Diagram Walkthrough

flowchart LR
  A["Global standards data"] -- "compute per-run per-standard medians" --> B["Standard median abundances"]
  B -- "collapse to run medians" --> C["median_by_run"]
  C -- "summarize by fraction" --> D["median_by_fraction"]
  D -- "adjust input `ABUNDANCE`" --> E["Median-normalized abundances"]

File Walkthrough

Relevant files

Bug fix

utils_normalize.R `Replace mean-based normalization with medians` R/utils_normalize.R Replace mean-based standard aggregation with `median()` Rename helper field from `mean_by_run` to `median_by_run` Normalize `ABUNDANCE` using run and fraction medians Drop median helper columns after merging	+20/-20

Tests

test_utils_normalize.R `Align test wording with median behavior` inst/tinytest/test_utils_normalize.R Update test comment to reference `median_by_run` Align unlabeled-standard explanation with median normalization	+1/-1

Motivation and Context

This PR switches the global standards normalization method from mean-based to median-based centering and scaling. The median is a more robust statistical measure that is less sensitive to outliers, making it better suited for handling real-world proteomics data where extreme values can skew mean-based calculations. This change aligns the global standards normalization approach with the existing median-based normalization method used elsewhere in MSstats.

Changes

Modified .normalizeGlobalStandards function (R/utils_normalize.R, lines 227-249):
- Changed per-standard aggregation from mean(ABUNDANCE, na.rm = TRUE) to median(ABUNDANCE, na.rm = TRUE) when computing standard abundances by RUN
- Updated per-RUN normalization factor from mean_by_run to median_by_run derived via median(ABUNDANCE, na.rm = TRUE) across standards
- Maintained per-fraction normalization factor calculation as median_by_fraction := median(median_by_run, na.rm = TRUE) by FRACTION
- Changed abundance adjustment formula from ABUNDANCE - mean_by_run + median_by_fraction to ABUNDANCE - median_by_run + median_by_fraction
- Updated cleanup step to remove median_by_run and median_by_fraction columns (previously removed mean_by_run and median_by_fraction)
Updated test comment (inst/tinytest/test_utils_normalize.R, line 119):
- Modified test comment in test_unlabeled_standard_detected to reflect that when standards are uniform across runs and fractions, median_by_fraction == median_by_run, rather than the previous comparison involving mean_by_run

Unit Tests

The existing test test_unlabeled_standard_detected was preserved with its executable logic unchanged. The test continues to verify that uniform standard intensities produce no shift in ABUNDANCE values, which remains valid under the median-based approach since the mathematical relationship holds: when all standards have equal abundances, both median_by_run and median_by_fraction will be identical, resulting in zero adjustment.

… normalization

coderabbitai · 2026-04-22T23:38:51Z

📝 Walkthrough

Walkthrough

The global-standards normalization method in .normalizeGlobalStandards switches from mean-based centering to median-based centering. The function now computes medians per (RUN, standard) and uses median values for run-level and fraction-level adjustments instead of means, with corresponding updates to the abundance adjustment formula.

Changes

Cohort / File(s)	Summary
Global Standards Normalization `R/utils_normalize.R`	Switched statistical centering method from mean to median: `median(ABUNDANCE, na.rm=TRUE)` replaces mean calculations, `median_by_run` and `median_by_fraction` replace corresponding mean columns, and the adjustment formula changes from `ABUNDANCE - mean_by_run + median_by_fraction` to `ABUNDANCE - median_by_run + median_by_fraction`.
Test Documentation `inst/tinytest/test_utils_normalize.R`	Updated test comment in `test_unlabeled_standard_detected` to reflect the corrected expected relationship: changed from `median_by_fraction` vs `mean_by_run` to `median_by_fraction` vs `median_by_run`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

feat: Introduce normalization with unlabeled peptides #192: Also modifies .normalizeGlobalStandards to adjust global-standards normalization logic with changes to run-level centering and ABUNDANCE adjustment.

Suggested labels

Review effort 2/5

Poem

🐰 From means to medians, we hop with care,
Statistics shift with robust flair,
Centering adjusted, outliers beware,
Median magic fills the air! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description lacks key details required by the template, including motivation/context, detailed changes list, testing description, and incomplete checklist.	Provide comprehensive motivation for switching to median-based normalization, expand the changes bullet list with specific technical details, describe unit tests added/modified, and explicitly address or check all checklist items.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately describes the primary change: switching from mean to median for global standards normalization, matching the code modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-equalize-medians

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-22T23:39:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ No major issues detected

github-actions · 2026-04-22T23:40:14Z

PR Code Suggestions ✨

No code suggestions found for the PR.

coderabbitai

🧹 Nitpick comments (1)

R/utils_normalize.R (1)
227-246: LGTM — clean mean→median substitution.

The aggregation change is internally consistent: per-(RUN, standard) medians → per-run median across standards → per-fraction median across runs, with the adjustment formula correctly using median_by_run on both sides of the merge. Using median is more robust to standards with outlier abundances, which is a sensible rationale for the switch.

A couple of small observations (no action required, all pre-existing behavior preserved by this PR):

The dcast → melt round-trip on lines 231-235 only materially matters if it introduces NA rows for missing (RUN, standard) combinations; since median(..., na.rm = TRUE) is used downstream, it ends up being effectively a no-op for the aggregation. Worth revisiting at some point, but not in scope here.

A run where no standard is measured at all will not appear in medians_by_standard, so after the all.x = TRUE merge its median_by_run / median_by_fraction are NA, and ABUNDANCE for that run becomes NA. This matches the previous mean-based behavior.

Given the PR description flags uncertainty about per-feature vs. per-peptide aggregation and confusing QC plots, it would be worth adding a test that exercises a multi-standard scenario (more than one name in standards) so the per-(RUN, standard) median step is actually covered — the existing tests only use a single standard, which makes the inner median() degenerate to identity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils_normalize.R` around lines 227 - 246, Add a unit/integration test that
exercises the multi-standard path in R/utils_normalize.R so the per-(RUN,
standard) median logic (the medians_by_standard computation: median_abundance,
dcast→melt, median_by_run and median_by_fraction, and final ABUNDANCE
adjustment) is actually used; create input with multiple distinct values in the
standards column across runs, merge with input to trigger the all.x = TRUE
branch, and assert that median_by_run/median_by_fraction are computed (not just
identity) and that ABUNDANCE is adjusted as expected (including handling of runs
with no standards producing NA).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@R/utils_normalize.R`:
- Around line 227-246: Add a unit/integration test that exercises the
multi-standard path in R/utils_normalize.R so the per-(RUN, standard) median
logic (the medians_by_standard computation: median_abundance, dcast→melt,
median_by_run and median_by_fraction, and final ABUNDANCE adjustment) is
actually used; create input with multiple distinct values in the standards
column across runs, merge with input to trigger the all.x = TRUE branch, and
assert that median_by_run/median_by_fraction are computed (not just identity)
and that ABUNDANCE is adjusted as expected (including handling of runs with no
standards producing NA).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 858e5637-d532-4426-a1ea-7aa1580da1ed

📥 Commits

Reviewing files that changed from the base of the PR and between 1c96555 and 618ac2e.

📒 Files selected for processing (2)

R/utils_normalize.R
inst/tinytest/test_utils_normalize.R

fix(globalStandards): Use median instead of mean for global standards…

618ac2e

… normalization

github-actions Bot added the Review effort 2/5 label Apr 22, 2026

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

fix normalization for unlabeled

367ad70

tonywu1999 merged commit 5ab0b47 into devel Apr 23, 2026
2 checks passed

tonywu1999 deleted the feat-equalize-medians branch April 23, 2026 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(globalStandards): Use median instead of mean for global standards normalization#198

fix(globalStandards): Use median instead of mean for global standards normalization#198
tonywu1999 merged 2 commits intodevelfrom
feat-equalize-medians

tonywu1999 commented Apr 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Motivation and Context

Changes

Unit Tests

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions Bot commented Apr 22, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Apr 22, 2026

PR Code Suggestions ✨

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading