fix(globalStandards): Use median instead of mean for global standards normalization#198
fix(globalStandards): Use median instead of mean for global standards normalization#198tonywu1999 merged 2 commits intodevelfrom
Conversation
📝 WalkthroughWalkthroughThe global-standards normalization method in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
R/utils_normalize.R (1)
227-246: LGTM — clean mean→median substitution.The aggregation change is internally consistent: per-(RUN, standard) medians → per-run median across standards → per-fraction median across runs, with the adjustment formula correctly using
median_by_runon both sides of the merge. Usingmedianis more robust to standards with outlier abundances, which is a sensible rationale for the switch.A couple of small observations (no action required, all pre-existing behavior preserved by this PR):
- The
dcast→meltround-trip on lines 231-235 only materially matters if it introducesNArows for missing (RUN, standard) combinations; sincemedian(..., na.rm = TRUE)is used downstream, it ends up being effectively a no-op for the aggregation. Worth revisiting at some point, but not in scope here.- A run where no standard is measured at all will not appear in
medians_by_standard, so after theall.x = TRUEmerge itsmedian_by_run/median_by_fractionareNA, andABUNDANCEfor that run becomesNA. This matches the previous mean-based behavior.Given the PR description flags uncertainty about per-feature vs. per-peptide aggregation and confusing QC plots, it would be worth adding a test that exercises a multi-standard scenario (more than one name in
standards) so the per-(RUN, standard) median step is actually covered — the existing tests only use a single standard, which makes the innermedian()degenerate to identity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/utils_normalize.R` around lines 227 - 246, Add a unit/integration test that exercises the multi-standard path in R/utils_normalize.R so the per-(RUN, standard) median logic (the medians_by_standard computation: median_abundance, dcast→melt, median_by_run and median_by_fraction, and final ABUNDANCE adjustment) is actually used; create input with multiple distinct values in the standards column across runs, merge with input to trigger the all.x = TRUE branch, and assert that median_by_run/median_by_fraction are computed (not just identity) and that ABUNDANCE is adjusted as expected (including handling of runs with no standards producing NA).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@R/utils_normalize.R`:
- Around line 227-246: Add a unit/integration test that exercises the
multi-standard path in R/utils_normalize.R so the per-(RUN, standard) median
logic (the medians_by_standard computation: median_abundance, dcast→melt,
median_by_run and median_by_fraction, and final ABUNDANCE adjustment) is
actually used; create input with multiple distinct values in the standards
column across runs, merge with input to trigger the all.x = TRUE branch, and
assert that median_by_run/median_by_fraction are computed (not just identity)
and that ABUNDANCE is adjusted as expected (including handling of runs with no
standards producing NA).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 858e5637-d532-4426-a1ea-7aa1580da1ed
📒 Files selected for processing (2)
R/utils_normalize.Rinst/tinytest/test_utils_normalize.R
PR Type
Bug fix
Description
Switch standard normalization from means to medians
Derive run offsets from median standards
Keep fraction equalization median-based
Update test wording for median logic
Diagram Walkthrough
File Walkthrough
utils_normalize.R
Replace mean-based normalization with mediansR/utils_normalize.R
median()mean_by_runtomedian_by_runABUNDANCEusing run and fraction medianstest_utils_normalize.R
Align test wording with median behaviorinst/tinytest/test_utils_normalize.R
median_by_runMotivation and Context
This PR switches the global standards normalization method from mean-based to median-based centering and scaling. The median is a more robust statistical measure that is less sensitive to outliers, making it better suited for handling real-world proteomics data where extreme values can skew mean-based calculations. This change aligns the global standards normalization approach with the existing median-based normalization method used elsewhere in MSstats.
Changes
Modified
.normalizeGlobalStandardsfunction (R/utils_normalize.R, lines 227-249):mean(ABUNDANCE, na.rm = TRUE)tomedian(ABUNDANCE, na.rm = TRUE)when computing standard abundances by RUNmean_by_runtomedian_by_runderived viamedian(ABUNDANCE, na.rm = TRUE)across standardsmedian_by_fraction := median(median_by_run, na.rm = TRUE)by FRACTIONABUNDANCE - mean_by_run + median_by_fractiontoABUNDANCE - median_by_run + median_by_fractionmedian_by_runandmedian_by_fractioncolumns (previously removedmean_by_runandmedian_by_fraction)Updated test comment (
inst/tinytest/test_utils_normalize.R, line 119):test_unlabeled_standard_detectedto reflect that when standards are uniform across runs and fractions,median_by_fraction == median_by_run, rather than the previous comparison involvingmean_by_runUnit Tests
The existing test
test_unlabeled_standard_detectedwas preserved with its executable logic unchanged. The test continues to verify that uniform standard intensities produce no shift in ABUNDANCE values, which remains valid under the median-based approach since the mathematical relationship holds: when all standards have equal abundances, bothmedian_by_runandmedian_by_fractionwill be identical, resulting in zero adjustment.