Skip to content

fix(globalStandards): Use median instead of mean for global standards normalization#198

Merged
tonywu1999 merged 2 commits intodevelfrom
feat-equalize-medians
Apr 23, 2026
Merged

fix(globalStandards): Use median instead of mean for global standards normalization#198
tonywu1999 merged 2 commits intodevelfrom
feat-equalize-medians

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 22, 2026

PR Type

Bug fix


Description

  • Switch standard normalization from means to medians

  • Derive run offsets from median standards

  • Keep fraction equalization median-based

  • Update test wording for median logic


Diagram Walkthrough

flowchart LR
  A["Global standards data"] -- "compute per-run per-standard medians" --> B["Standard median abundances"]
  B -- "collapse to run medians" --> C["median_by_run"]
  C -- "summarize by fraction" --> D["median_by_fraction"]
  D -- "adjust input `ABUNDANCE`" --> E["Median-normalized abundances"]
Loading

File Walkthrough

Relevant files
Bug fix
utils_normalize.R
Replace mean-based normalization with medians                       

R/utils_normalize.R

  • Replace mean-based standard aggregation with median()
  • Rename helper field from mean_by_run to median_by_run
  • Normalize ABUNDANCE using run and fraction medians
  • Drop median helper columns after merging
+20/-20 
Tests
test_utils_normalize.R
Align test wording with median behavior                                   

inst/tinytest/test_utils_normalize.R

  • Update test comment to reference median_by_run
  • Align unlabeled-standard explanation with median normalization
+1/-1     

Motivation and Context

This PR switches the global standards normalization method from mean-based to median-based centering and scaling. The median is a more robust statistical measure that is less sensitive to outliers, making it better suited for handling real-world proteomics data where extreme values can skew mean-based calculations. This change aligns the global standards normalization approach with the existing median-based normalization method used elsewhere in MSstats.

Changes

  • Modified .normalizeGlobalStandards function (R/utils_normalize.R, lines 227-249):

    • Changed per-standard aggregation from mean(ABUNDANCE, na.rm = TRUE) to median(ABUNDANCE, na.rm = TRUE) when computing standard abundances by RUN
    • Updated per-RUN normalization factor from mean_by_run to median_by_run derived via median(ABUNDANCE, na.rm = TRUE) across standards
    • Maintained per-fraction normalization factor calculation as median_by_fraction := median(median_by_run, na.rm = TRUE) by FRACTION
    • Changed abundance adjustment formula from ABUNDANCE - mean_by_run + median_by_fraction to ABUNDANCE - median_by_run + median_by_fraction
    • Updated cleanup step to remove median_by_run and median_by_fraction columns (previously removed mean_by_run and median_by_fraction)
  • Updated test comment (inst/tinytest/test_utils_normalize.R, line 119):

    • Modified test comment in test_unlabeled_standard_detected to reflect that when standards are uniform across runs and fractions, median_by_fraction == median_by_run, rather than the previous comparison involving mean_by_run

Unit Tests

The existing test test_unlabeled_standard_detected was preserved with its executable logic unchanged. The test continues to verify that uniform standard intensities produce no shift in ABUNDANCE values, which remains valid under the median-based approach since the mathematical relationship holds: when all standards have equal abundances, both median_by_run and median_by_fraction will be identical, resulting in zero adjustment.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

The global-standards normalization method in .normalizeGlobalStandards switches from mean-based centering to median-based centering. The function now computes medians per (RUN, standard) and uses median values for run-level and fraction-level adjustments instead of means, with corresponding updates to the abundance adjustment formula.

Changes

Cohort / File(s) Summary
Global Standards Normalization
R/utils_normalize.R
Switched statistical centering method from mean to median: median(ABUNDANCE, na.rm=TRUE) replaces mean calculations, median_by_run and median_by_fraction replace corresponding mean columns, and the adjustment formula changes from ABUNDANCE - mean_by_run + median_by_fraction to ABUNDANCE - median_by_run + median_by_fraction.
Test Documentation
inst/tinytest/test_utils_normalize.R
Updated test comment in test_unlabeled_standard_detected to reflect the corrected expected relationship: changed from median_by_fraction vs mean_by_run to median_by_fraction vs median_by_run.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

Review effort 2/5

Poem

🐰 From means to medians, we hop with care,
Statistics shift with robust flair,
Centering adjusted, outliers beware,
Median magic fills the air! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description lacks key details required by the template, including motivation/context, detailed changes list, testing description, and incomplete checklist. Provide comprehensive motivation for switching to median-based normalization, expand the changes bullet list with specific technical details, describe unit tests added/modified, and explicitly address or check all checklist items.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the primary change: switching from mean to median for global standards normalization, matching the code modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-equalize-medians

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
R/utils_normalize.R (1)

227-246: LGTM — clean mean→median substitution.

The aggregation change is internally consistent: per-(RUN, standard) medians → per-run median across standards → per-fraction median across runs, with the adjustment formula correctly using median_by_run on both sides of the merge. Using median is more robust to standards with outlier abundances, which is a sensible rationale for the switch.

A couple of small observations (no action required, all pre-existing behavior preserved by this PR):

  • The dcastmelt round-trip on lines 231-235 only materially matters if it introduces NA rows for missing (RUN, standard) combinations; since median(..., na.rm = TRUE) is used downstream, it ends up being effectively a no-op for the aggregation. Worth revisiting at some point, but not in scope here.
  • A run where no standard is measured at all will not appear in medians_by_standard, so after the all.x = TRUE merge its median_by_run / median_by_fraction are NA, and ABUNDANCE for that run becomes NA. This matches the previous mean-based behavior.

Given the PR description flags uncertainty about per-feature vs. per-peptide aggregation and confusing QC plots, it would be worth adding a test that exercises a multi-standard scenario (more than one name in standards) so the per-(RUN, standard) median step is actually covered — the existing tests only use a single standard, which makes the inner median() degenerate to identity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils_normalize.R` around lines 227 - 246, Add a unit/integration test that
exercises the multi-standard path in R/utils_normalize.R so the per-(RUN,
standard) median logic (the medians_by_standard computation: median_abundance,
dcast→melt, median_by_run and median_by_fraction, and final ABUNDANCE
adjustment) is actually used; create input with multiple distinct values in the
standards column across runs, merge with input to trigger the all.x = TRUE
branch, and assert that median_by_run/median_by_fraction are computed (not just
identity) and that ABUNDANCE is adjusted as expected (including handling of runs
with no standards producing NA).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@R/utils_normalize.R`:
- Around line 227-246: Add a unit/integration test that exercises the
multi-standard path in R/utils_normalize.R so the per-(RUN, standard) median
logic (the medians_by_standard computation: median_abundance, dcast→melt,
median_by_run and median_by_fraction, and final ABUNDANCE adjustment) is
actually used; create input with multiple distinct values in the standards
column across runs, merge with input to trigger the all.x = TRUE branch, and
assert that median_by_run/median_by_fraction are computed (not just identity)
and that ABUNDANCE is adjusted as expected (including handling of runs with no
standards producing NA).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 858e5637-d532-4426-a1ea-7aa1580da1ed

📥 Commits

Reviewing files that changed from the base of the PR and between 1c96555 and 618ac2e.

📒 Files selected for processing (2)
  • R/utils_normalize.R
  • inst/tinytest/test_utils_normalize.R

@tonywu1999 tonywu1999 merged commit 5ab0b47 into devel Apr 23, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the feat-equalize-medians branch April 23, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant