Qc anomaly by tonywu1999 · Pull Request #203 · Vitek-Lab/MSstats

tonywu1999 · 2026-04-24T12:52:18Z

Description

Add MSstatsQualityMetricsPlot visualization helper
Plot protein metrics across run order
Aggregate fragment ions to precursor means
Support PDF and plotly HTML export

Diagram Walkthrough

flowchart LR
  input["Converter output with quality metrics"]
  validate["Validate metric, Run, and protein"]
  aggregate["Aggregate fragment ions per precursor and run"]
  plot["Build ggplot quality trend lines"]
  export["Return plot or save PDF/HTML"]

  input -- "filter selected protein" --> validate
  validate -- "prepare precursor series" --> aggregate
  aggregate -- "generate visualization" --> plot
  plot -- "static or interactive output" --> export

File Walkthrough

Relevant files

Enhancement

NAMESPACE `Export new quality metrics plotting function` NAMESPACE Export `MSstatsQualityMetricsPlot` Make the new plotting API public	+1/-0
plot_quality_metrics.R `Add run-ordered quality metrics plotting utility` R/plot_quality_metrics.R Add `MSstatsQualityMetricsPlot` for QC metrics Validate `metric`, `Run`, and `which.Protein` Aggregate values by precursor and run Support `ggplot2`, plotly, PDF, and HTML	+126/-0

Documentation

MSstatsQualityMetricsPlot.Rd `Document quality metrics plotting interface` man/MSstatsQualityMetricsPlot.Rd Add generated documentation for new function Document inputs, behavior, return values Describe run ordering and aggregation details Include usage examples for static and interactive plots	+67/-0

coderabbitai · 2026-04-24T12:52:32Z

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 0 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d9f90767-e0ef-49a4-a71e-ebfa4bd09b5c

📥 Commits

Reviewing files that changed from the base of the PR and between 064b917 and 4173c34.

📒 Files selected for processing (2)

DESCRIPTION
R/plot_quality_metrics.R

📝 Walkthrough

Walkthrough

A new exported function MSstatsQualityMetricsPlot is added to the package. This function visualizes quality metrics from MSstats converter output for a single protein, aggregating values by precursor across runs and rendering as a ggplot2 object or interactive plotly visualization with optional file export.

Changes

Cohort / File(s)	Summary
New Quality Metrics Plotting Feature `NAMESPACE`, `R/plot_quality_metrics.R`, `man/MSstatsQualityMetricsPlot.Rd`	Introduces `MSstatsQualityMetricsPlot()` function with validation of required input columns, data filtering by protein, precursor derivation from peptide sequence and charge, metric aggregation by run and precursor, and conditional visualization output (ggplot2 or plotly) with optional PDF/HTML export.

Sequence Diagram

sequenceDiagram
    actor User
    participant Function as MSstatsQualityMetricsPlot
    participant Validation as Input Validation
    participant Processing as Data Processing
    participant ggplot2
    participant Plotly
    participant FileSystem as File System

    User->>Function: input, metric, which.Protein, address, isPlotly
    Function->>Validation: Validate columns (metric, Run, ProteinName)
    Validation-->>Function: Validation passed
    Function->>Processing: Filter by protein & derive Precursor
    Processing->>Processing: Aggregate metric by Run & Precursor
    Processing-->>Function: Aggregated data
    Function->>ggplot2: Create plot with colored precursor lines
    ggplot2-->>Function: ggplot object
    alt isPlotly = TRUE
        Function->>Plotly: Convert to interactive plotly
        Plotly-->>Function: plotly object
        alt address provided
            Function->>FileSystem: Save as HTML
            FileSystem-->>Function: Saved
        end
        Function-->>User: Return plotly object
    else isPlotly = FALSE
        alt address provided
            Function->>FileSystem: Save as PDF
            FileSystem-->>Function: Saved
        end
        Function-->>User: Return ggplot object
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Review effort 2/5

Poem

🐰 Whiskers twitching with delight,
A new plot function shines so bright,
Quality metrics leap and bound,
Precursors dancing all around,
ggplot and plotly, side by side,
This feature hops with rabbit pride! 🎨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and generic, using non-descriptive terms that don't clearly convey the main change.	Use a more descriptive title that clearly indicates the feature, such as 'Add MSstatsQualityMetricsPlot visualization function' or 'Implement quality metrics plotting for anomaly scores'.
Description check	❓ Inconclusive	The PR description provides a mermaid diagram and file walkthrough but omits critical template sections: motivation/context, detailed change list, testing information, and checklist verification.	Add detailed motivation explaining the quality metrics visualization need, provide bullet-point changes summary, document testing approach, and complete the pre-review checklist items.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch qc-anomaly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-24T12:54:05Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing validation The helper only validates `metric` and `Run`, but it also requires `ProteinName`, `PeptideSequence`, and `PrecursorCharge`. If a caller passes a converter output that has been subsetted or renamed and one of those columns is missing, this code fails with a low-level error instead of a clear message, which makes the exported API brittle for real user inputs. if (!which.Protein %in% input_df$ProteinName) { stop(paste0("Protein '", which.Protein, "' not found in input.")) } input_df <- input_df[input_df$ProteinName == which.Protein, ] if (!is.factor(input_df$Run)) { input_df$Run <- factor(input_df$Run) } input_df$Precursor <- paste(input_df$PeptideSequence, input_df$PrecursorCharge, sep = "_") Type mismatch `metric` is checked for existence but not for numeric type before it is averaged with `mean()`. If a user selects an existing non-numeric column such as `PeptideSequence`, the aggregation produces warnings and `NA` values, resulting in an empty or misleading plot instead of a clear validation error. if (!metric %in% colnames(input_df)) { stop(paste0( "Column '", metric, "' not found in input. ", "Available columns: ", paste(colnames(input_df), collapse = ", ") )) } if (!"Run" %in% colnames(input_df)) { stop("'Run' column not found in input.") } if (!which.Protein %in% input_df$ProteinName) { stop(paste0("Protein '", which.Protein, "' not found in input.")) } input_df <- input_df[input_df$ProteinName == which.Protein, ] if (!is.factor(input_df$Run)) { input_df$Run <- factor(input_df$Run) } input_df$Precursor <- paste(input_df$PeptideSequence, input_df$PrecursorCharge, sep = "_") # Average across fragment ions so each precursor has one value per run plot_df <- aggregate( input_df[[metric]], by = list(Run = input_df$Run, Precursor = input_df$Precursor), FUN = mean, na.rm = TRUE

github-actions · 2026-04-24T12:55:18Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate required input columns Validate every required plotting column before accessing it. Right now missing `ProteinName`, `PeptideSequence`, or `PrecursorCharge` columns will fail later with opaque errors instead of a clear message, which makes this helper brittle on converter outputs that do not contain the expected schema. R/plot_quality_metrics.R [59-79] -if (!metric %in% colnames(input_df)) { +required_cols <- c("ProteinName", "Run", "PeptideSequence", "PrecursorCharge", metric) +missing_cols <- setdiff(required_cols, colnames(input_df)) +if (length(missing_cols) > 0) { stop(paste0( - "Column '", metric, "' not found in input. ", - "Available columns: ", paste(colnames(input_df), collapse = ", ") + "Required column(s) missing from input: ", + paste(missing_cols, collapse = ", ") )) -} -if (!"Run" %in% colnames(input_df)) { - stop("'Run' column not found in input.") } if (!which.Protein %in% input_df$ProteinName) { stop(paste0("Protein '", which.Protein, "' not found in input.")) } -input_df <- input_df[input_df$ProteinName == which.Protein, ] +input_df <- input_df[input_df$ProteinName == which.Protein, , drop = FALSE] if (!is.factor(input_df$Run)) { input_df$Run <- factor(input_df$Run) } input_df$Precursor <- paste(input_df$PeptideSequence, input_df$PrecursorCharge, sep = "_") Suggestion importance[1-10]: 7 __ Why: This is accurate: the function currently checks `metric` and `Run` but later assumes `ProteinName`, `PeptideSequence`, and `PrecursorCharge` exist. Adding a single required-column validation improves robustness and gives clearer failures for malformed converter outputs.	Medium
Possible issue	Check aggregation inputs first Guard the aggregation against non-numeric or entirely missing `metric` values. As written, `mean()` will error for non-numeric columns and can emit `NaN` groups, which produces a broken or empty plot instead of a clear failure. R/plot_quality_metrics.R [81-87] +if (!is.numeric(input_df[[metric]])) { + stop(paste0("Column '", metric, "' must be numeric to plot aggregated values.")) +} + # Average across fragment ions so each precursor has one value per run plot_df <- aggregate( input_df[[metric]], - by = list(Run = input_df$Run, Precursor = input_df$Precursor), - FUN = mean, na.rm = TRUE + by = list(Run = input_df$Run, Precursor = input_df$Precursor), + FUN = function(x) { + if (all(is.na(x))) NA_real_ else mean(x, na.rm = TRUE) + } ) colnames(plot_df)[colnames(plot_df) == "x"] <- metric +plot_df <- plot_df[!is.na(plot_df[[metric]]), , drop = FALSE] +if (nrow(plot_df) == 0) { + stop(paste0( + "No non-missing values available for '", metric, + "' in protein '", which.Protein, "'." + )) +} + Suggestion importance[1-10]: 6 __ Why: This correctly identifies that `mean()` on `input_df[[metric]]` will fail for non-numeric data and can yield unusable `NaN` groups when all values are missing. The added checks make plotting failures around `metric` much clearer, though this is mainly defensive validation.	Low
General	Always close opened device Ensure the graphics device is always closed after opening the PDF. If `print(p)` or any later code errors, the current implementation leaves the device open and can corrupt subsequent plotting in the same R session. R/plot_quality_metrics.R [119-123] if (!identical(address, FALSE)) { - pdf(paste0(address, "QualityMetricsPlot.pdf")) + pdf_file <- paste0(address, "QualityMetricsPlot.pdf") + pdf(pdf_file) + on.exit(dev.off(), add = TRUE) print(p) - dev.off() } Suggestion importance[1-10]: 5 __ Why: This is a valid robustness improvement because an error after `pdf()` could leave the graphics device open. Using `on.exit()` around `dev.off()` reduces session-side effects, but it is not a high-impact functional issue in the normal path.	Low

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

R/plot_quality_metrics.R (2)
82-87: aggregate on all-NA groups emits NaN with a warning.

When every fragment-ion value for a given Run × Precursor is NA, mean(..., na.rm = TRUE) returns NaN and mean.default warns "argument is not numeric or logical" / produces NaN. The plot still renders (missing points), but users will see a stream of confusing warnings. Consider suppressing the no-complete-cases warning, or filtering all-NA groups out before aggregation.
♻️ Suggested tweak
-    plot_df <- aggregate(
-        input_df[[metric]],
-        by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
-        FUN = mean, na.rm = TRUE
-    )
+    plot_df <- suppressWarnings(aggregate(
+        input_df[[metric]],
+        by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
+        FUN = function(v) if (all(is.na(v))) NA_real_ else mean(v, na.rm = TRUE)
+    ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/plot_quality_metrics.R` around lines 82 - 87, The aggregate call on
input_df[[metric]] with FUN = mean, na.rm = TRUE produces NaN and warnings for
groups where all values are NA; to fix, pre-filter those all-NA Run×Precursor
groups before calling aggregate (e.g., remove rows or group keys where all
fragment values for the given metric are NA) so aggregate only sees groups with
at least one non-NA, then run the existing aggregate into plot_df and keep the
colnames fix for "x" -> metric; reference input_df, metric, Run, Precursor, and
plot_df when locating where to add the filter.
119-123: Use on.exit(dev.off()) to protect against print(p) errors.

If print(p) throws (e.g. ggplot build error on an unexpected metric column type), dev.off() is never called and the PDF device leaks, which can cause subsequent plotting calls in the same session to write into the orphaned file or fail. Same concern doesn't apply to the plotly HTML branch since save_html manages its own file handle.
♻️ Suggested fix
     if (!identical(address, FALSE)) {
         pdf(paste0(address, "QualityMetricsPlot.pdf"))
+        on.exit(dev.off(), add = TRUE)
         print(p)
-        dev.off()
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/plot_quality_metrics.R` around lines 119 - 123, The PDF device opened in
the block guarded by if (!identical(address, FALSE)) should be protected with
on.exit(dev.off()) so that the device is closed even if print(p) throws; update
the block around pdf(paste0(address, "QualityMetricsPlot.pdf")) / print(p) /
dev.off() to call on.exit(dev.off(), add = TRUE) immediately after opening the
device (before print(p)) so the device will always be closed, keeping the final
dev.off() as a normal cleanup or removing the redundant one if desired.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/plot_quality_metrics.R`:
- Line 97: The DESCRIPTION currently imports ggplot2 without a version
constraint but the code uses geom_line(linewidth = 0.6) which requires ggplot2
>= 3.4.0; update the DESCRIPTION Imports entry to declare ggplot2 (>= 3.4.0) so
installations pull a ggplot2 version that supports the linewidth argument and
avoid "unused argument (linewidth = 0.6)" errors.
- Around line 59-70: The function currently fails to validate required columns
and the which.Protein argument: first, add an explicit missing() check for
which.Protein and provide a clear error if it's not supplied; second, before any
use of input_df$ProteinName/PeptideSequence/PrecursorCharge (and Run/metric),
verify those column names exist in input_df (e.g., check "ProteinName",
"PeptideSequence", "PrecursorCharge", "Run" and the requested metric via
colnames(input_df)) and stop with descriptive messages if any are missing;
finally, only after these column-existence checks perform the membership test
for which.Protein in input_df$ProteinName so the error reflects a missing
protein rather than a missing column.

---

Nitpick comments:
In `@R/plot_quality_metrics.R`:
- Around line 82-87: The aggregate call on input_df[[metric]] with FUN = mean,
na.rm = TRUE produces NaN and warnings for groups where all values are NA; to
fix, pre-filter those all-NA Run×Precursor groups before calling aggregate
(e.g., remove rows or group keys where all fragment values for the given metric
are NA) so aggregate only sees groups with at least one non-NA, then run the
existing aggregate into plot_df and keep the colnames fix for "x" -> metric;
reference input_df, metric, Run, Precursor, and plot_df when locating where to
add the filter.
- Around line 119-123: The PDF device opened in the block guarded by if
(!identical(address, FALSE)) should be protected with on.exit(dev.off()) so that
the device is closed even if print(p) throws; update the block around
pdf(paste0(address, "QualityMetricsPlot.pdf")) / print(p) / dev.off() to call
on.exit(dev.off(), add = TRUE) immediately after opening the device (before
print(p)) so the device will always be closed, keeping the final dev.off() as a
normal cleanup or removing the redundant one if desired.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c05809a-9219-4f93-b7b5-f7409f82d64b

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb7066 and 064b917.

📒 Files selected for processing (3)

NAMESPACE
R/plot_quality_metrics.R
man/MSstatsQualityMetricsPlot.Rd

tonywu1999 added 3 commits April 23, 2026 23:21

add plotting code for anomaly scores and other metrics

f4812d3

per protein, precursor level anomaly metric tracking

9841a0b

add exports

064b917

github-actions Bot added the Review effort 2/5 label Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread R/plot_quality_metrics.R

Comment thread R/plot_quality_metrics.R

address feedback

4173c34

tonywu1999 merged commit 688ff6d into devel Apr 24, 2026
2 checks passed

tonywu1999 deleted the qc-anomaly branch April 24, 2026 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qc anomaly#203

Qc anomaly#203
tonywu1999 merged 4 commits intodevelfrom
qc-anomaly

tonywu1999 commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (2 inconclusive)

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (2 inconclusive)

Uh oh!

github-actions Bot commented Apr 24, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Apr 24, 2026

PR Code Suggestions ✨

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading