Skip to content

Qc anomaly#203

Merged
tonywu1999 merged 4 commits intodevelfrom
qc-anomaly
Apr 24, 2026
Merged

Qc anomaly#203
tonywu1999 merged 4 commits intodevelfrom
qc-anomaly

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 24, 2026

Description

  • Add MSstatsQualityMetricsPlot visualization helper

  • Plot protein metrics across run order

  • Aggregate fragment ions to precursor means

  • Support PDF and plotly HTML export


Diagram Walkthrough

flowchart LR
  input["Converter output with quality metrics"]
  validate["Validate metric, Run, and protein"]
  aggregate["Aggregate fragment ions per precursor and run"]
  plot["Build ggplot quality trend lines"]
  export["Return plot or save PDF/HTML"]

  input -- "filter selected protein" --> validate
  validate -- "prepare precursor series" --> aggregate
  aggregate -- "generate visualization" --> plot
  plot -- "static or interactive output" --> export
Loading

File Walkthrough

Relevant files
Enhancement
NAMESPACE
Export new quality metrics plotting function                         

NAMESPACE

  • Export MSstatsQualityMetricsPlot
  • Make the new plotting API public
+1/-0     
plot_quality_metrics.R
Add run-ordered quality metrics plotting utility                 

R/plot_quality_metrics.R

  • Add MSstatsQualityMetricsPlot for QC metrics
  • Validate metric, Run, and which.Protein
  • Aggregate values by precursor and run
  • Support ggplot2, plotly, PDF, and HTML
+126/-0 
Documentation
MSstatsQualityMetricsPlot.Rd
Document quality metrics plotting interface                           

man/MSstatsQualityMetricsPlot.Rd

  • Add generated documentation for new function
  • Document inputs, behavior, return values
  • Describe run ordering and aggregation details
  • Include usage examples for static and interactive plots
+67/-0   

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 0 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d9f90767-e0ef-49a4-a71e-ebfa4bd09b5c

📥 Commits

Reviewing files that changed from the base of the PR and between 064b917 and 4173c34.

📒 Files selected for processing (2)
  • DESCRIPTION
  • R/plot_quality_metrics.R
📝 Walkthrough

Walkthrough

A new exported function MSstatsQualityMetricsPlot is added to the package. This function visualizes quality metrics from MSstats converter output for a single protein, aggregating values by precursor across runs and rendering as a ggplot2 object or interactive plotly visualization with optional file export.

Changes

Cohort / File(s) Summary
New Quality Metrics Plotting Feature
NAMESPACE, R/plot_quality_metrics.R, man/MSstatsQualityMetricsPlot.Rd
Introduces MSstatsQualityMetricsPlot() function with validation of required input columns, data filtering by protein, precursor derivation from peptide sequence and charge, metric aggregation by run and precursor, and conditional visualization output (ggplot2 or plotly) with optional PDF/HTML export.

Sequence Diagram

sequenceDiagram
    actor User
    participant Function as MSstatsQualityMetricsPlot
    participant Validation as Input Validation
    participant Processing as Data Processing
    participant ggplot2
    participant Plotly
    participant FileSystem as File System

    User->>Function: input, metric, which.Protein, address, isPlotly
    Function->>Validation: Validate columns (metric, Run, ProteinName)
    Validation-->>Function: Validation passed
    Function->>Processing: Filter by protein & derive Precursor
    Processing->>Processing: Aggregate metric by Run & Precursor
    Processing-->>Function: Aggregated data
    Function->>ggplot2: Create plot with colored precursor lines
    ggplot2-->>Function: ggplot object
    alt isPlotly = TRUE
        Function->>Plotly: Convert to interactive plotly
        Plotly-->>Function: plotly object
        alt address provided
            Function->>FileSystem: Save as HTML
            FileSystem-->>Function: Saved
        end
        Function-->>User: Return plotly object
    else isPlotly = FALSE
        alt address provided
            Function->>FileSystem: Save as PDF
            FileSystem-->>Function: Saved
        end
        Function-->>User: Return ggplot object
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Review effort 2/5

Poem

🐰 Whiskers twitching with delight,
A new plot function shines so bright,
Quality metrics leap and bound,
Precursors dancing all around,
ggplot and plotly, side by side,
This feature hops with rabbit pride! 🎨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and generic, using non-descriptive terms that don't clearly convey the main change. Use a more descriptive title that clearly indicates the feature, such as 'Add MSstatsQualityMetricsPlot visualization function' or 'Implement quality metrics plotting for anomaly scores'.
Description check ❓ Inconclusive The PR description provides a mermaid diagram and file walkthrough but omits critical template sections: motivation/context, detailed change list, testing information, and checklist verification. Add detailed motivation explaining the quality metrics visualization need, provide bullet-point changes summary, document testing approach, and complete the pre-review checklist items.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch qc-anomaly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing validation

The helper only validates metric and Run, but it also requires ProteinName, PeptideSequence, and PrecursorCharge. If a caller passes a converter output that has been subsetted or renamed and one of those columns is missing, this code fails with a low-level error instead of a clear message, which makes the exported API brittle for real user inputs.

if (!which.Protein %in% input_df$ProteinName) {
    stop(paste0("Protein '", which.Protein, "' not found in input."))
}

input_df <- input_df[input_df$ProteinName == which.Protein, ]

if (!is.factor(input_df$Run)) {
    input_df$Run <- factor(input_df$Run)
}

input_df$Precursor <- paste(input_df$PeptideSequence,
                            input_df$PrecursorCharge, sep = "_")
Type mismatch

metric is checked for existence but not for numeric type before it is averaged with mean(). If a user selects an existing non-numeric column such as PeptideSequence, the aggregation produces warnings and NA values, resulting in an empty or misleading plot instead of a clear validation error.

if (!metric %in% colnames(input_df)) {
    stop(paste0(
        "Column '", metric, "' not found in input. ",
        "Available columns: ", paste(colnames(input_df), collapse = ", ")
    ))
}
if (!"Run" %in% colnames(input_df)) {
    stop("'Run' column not found in input.")
}
if (!which.Protein %in% input_df$ProteinName) {
    stop(paste0("Protein '", which.Protein, "' not found in input."))
}

input_df <- input_df[input_df$ProteinName == which.Protein, ]

if (!is.factor(input_df$Run)) {
    input_df$Run <- factor(input_df$Run)
}

input_df$Precursor <- paste(input_df$PeptideSequence,
                            input_df$PrecursorCharge, sep = "_")

# Average across fragment ions so each precursor has one value per run
plot_df <- aggregate(
    input_df[[metric]],
    by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
    FUN = mean, na.rm = TRUE

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate required input columns

Validate every required plotting column before accessing it. Right now missing
ProteinName, PeptideSequence, or PrecursorCharge columns will fail later with opaque
errors instead of a clear message, which makes this helper brittle on converter
outputs that do not contain the expected schema.

R/plot_quality_metrics.R [59-79]

-if (!metric %in% colnames(input_df)) {
+required_cols <- c("ProteinName", "Run", "PeptideSequence", "PrecursorCharge", metric)
+missing_cols <- setdiff(required_cols, colnames(input_df))
+if (length(missing_cols) > 0) {
     stop(paste0(
-        "Column '", metric, "' not found in input. ",
-        "Available columns: ", paste(colnames(input_df), collapse = ", ")
+        "Required column(s) missing from input: ",
+        paste(missing_cols, collapse = ", ")
     ))
-}
-if (!"Run" %in% colnames(input_df)) {
-    stop("'Run' column not found in input.")
 }
 if (!which.Protein %in% input_df$ProteinName) {
     stop(paste0("Protein '", which.Protein, "' not found in input."))
 }
 
-input_df <- input_df[input_df$ProteinName == which.Protein, ]
+input_df <- input_df[input_df$ProteinName == which.Protein, , drop = FALSE]
 
 if (!is.factor(input_df$Run)) {
     input_df$Run <- factor(input_df$Run)
 }
 
 input_df$Precursor <- paste(input_df$PeptideSequence,
                             input_df$PrecursorCharge, sep = "_")
Suggestion importance[1-10]: 7

__

Why: This is accurate: the function currently checks metric and Run but later assumes ProteinName, PeptideSequence, and PrecursorCharge exist. Adding a single required-column validation improves robustness and gives clearer failures for malformed converter outputs.

Medium
Check aggregation inputs first

Guard the aggregation against non-numeric or entirely missing metric values. As
written, mean() will error for non-numeric columns and can emit NaN groups, which
produces a broken or empty plot instead of a clear failure.

R/plot_quality_metrics.R [81-87]

+if (!is.numeric(input_df[[metric]])) {
+    stop(paste0("Column '", metric, "' must be numeric to plot aggregated values."))
+}
+
 # Average across fragment ions so each precursor has one value per run
 plot_df <- aggregate(
     input_df[[metric]],
-    by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
-    FUN = mean, na.rm = TRUE
+    by = list(Run = input_df$Run, Precursor = input_df$Precursor),
+    FUN = function(x) {
+        if (all(is.na(x))) NA_real_ else mean(x, na.rm = TRUE)
+    }
 )
 colnames(plot_df)[colnames(plot_df) == "x"] <- metric
+plot_df <- plot_df[!is.na(plot_df[[metric]]), , drop = FALSE]
 
+if (nrow(plot_df) == 0) {
+    stop(paste0(
+        "No non-missing values available for '", metric,
+        "' in protein '", which.Protein, "'."
+    ))
+}
+
Suggestion importance[1-10]: 6

__

Why: This correctly identifies that mean() on input_df[[metric]] will fail for non-numeric data and can yield unusable NaN groups when all values are missing. The added checks make plotting failures around metric much clearer, though this is mainly defensive validation.

Low
General
Always close opened device

Ensure the graphics device is always closed after opening the PDF. If print(p) or
any later code errors, the current implementation leaves the device open and can
corrupt subsequent plotting in the same R session.

R/plot_quality_metrics.R [119-123]

 if (!identical(address, FALSE)) {
-    pdf(paste0(address, "QualityMetricsPlot.pdf"))
+    pdf_file <- paste0(address, "QualityMetricsPlot.pdf")
+    pdf(pdf_file)
+    on.exit(dev.off(), add = TRUE)
     print(p)
-    dev.off()
 }
Suggestion importance[1-10]: 5

__

Why: This is a valid robustness improvement because an error after pdf() could leave the graphics device open. Using on.exit() around dev.off() reduces session-side effects, but it is not a high-impact functional issue in the normal path.

Low

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
R/plot_quality_metrics.R (2)

82-87: aggregate on all-NA groups emits NaN with a warning.

When every fragment-ion value for a given Run × Precursor is NA, mean(..., na.rm = TRUE) returns NaN and mean.default warns "argument is not numeric or logical" / produces NaN. The plot still renders (missing points), but users will see a stream of confusing warnings. Consider suppressing the no-complete-cases warning, or filtering all-NA groups out before aggregation.

♻️ Suggested tweak
-    plot_df <- aggregate(
-        input_df[[metric]],
-        by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
-        FUN = mean, na.rm = TRUE
-    )
+    plot_df <- suppressWarnings(aggregate(
+        input_df[[metric]],
+        by  = list(Run = input_df$Run, Precursor = input_df$Precursor),
+        FUN = function(v) if (all(is.na(v))) NA_real_ else mean(v, na.rm = TRUE)
+    ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/plot_quality_metrics.R` around lines 82 - 87, The aggregate call on
input_df[[metric]] with FUN = mean, na.rm = TRUE produces NaN and warnings for
groups where all values are NA; to fix, pre-filter those all-NA Run×Precursor
groups before calling aggregate (e.g., remove rows or group keys where all
fragment values for the given metric are NA) so aggregate only sees groups with
at least one non-NA, then run the existing aggregate into plot_df and keep the
colnames fix for "x" -> metric; reference input_df, metric, Run, Precursor, and
plot_df when locating where to add the filter.

119-123: Use on.exit(dev.off()) to protect against print(p) errors.

If print(p) throws (e.g. ggplot build error on an unexpected metric column type), dev.off() is never called and the PDF device leaks, which can cause subsequent plotting calls in the same session to write into the orphaned file or fail. Same concern doesn't apply to the plotly HTML branch since save_html manages its own file handle.

♻️ Suggested fix
     if (!identical(address, FALSE)) {
         pdf(paste0(address, "QualityMetricsPlot.pdf"))
+        on.exit(dev.off(), add = TRUE)
         print(p)
-        dev.off()
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/plot_quality_metrics.R` around lines 119 - 123, The PDF device opened in
the block guarded by if (!identical(address, FALSE)) should be protected with
on.exit(dev.off()) so that the device is closed even if print(p) throws; update
the block around pdf(paste0(address, "QualityMetricsPlot.pdf")) / print(p) /
dev.off() to call on.exit(dev.off(), add = TRUE) immediately after opening the
device (before print(p)) so the device will always be closed, keeping the final
dev.off() as a normal cleanup or removing the redundant one if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/plot_quality_metrics.R`:
- Line 97: The DESCRIPTION currently imports ggplot2 without a version
constraint but the code uses geom_line(linewidth = 0.6) which requires ggplot2
>= 3.4.0; update the DESCRIPTION Imports entry to declare ggplot2 (>= 3.4.0) so
installations pull a ggplot2 version that supports the linewidth argument and
avoid "unused argument (linewidth = 0.6)" errors.
- Around line 59-70: The function currently fails to validate required columns
and the which.Protein argument: first, add an explicit missing() check for
which.Protein and provide a clear error if it's not supplied; second, before any
use of input_df$ProteinName/PeptideSequence/PrecursorCharge (and Run/metric),
verify those column names exist in input_df (e.g., check "ProteinName",
"PeptideSequence", "PrecursorCharge", "Run" and the requested metric via
colnames(input_df)) and stop with descriptive messages if any are missing;
finally, only after these column-existence checks perform the membership test
for which.Protein in input_df$ProteinName so the error reflects a missing
protein rather than a missing column.

---

Nitpick comments:
In `@R/plot_quality_metrics.R`:
- Around line 82-87: The aggregate call on input_df[[metric]] with FUN = mean,
na.rm = TRUE produces NaN and warnings for groups where all values are NA; to
fix, pre-filter those all-NA Run×Precursor groups before calling aggregate
(e.g., remove rows or group keys where all fragment values for the given metric
are NA) so aggregate only sees groups with at least one non-NA, then run the
existing aggregate into plot_df and keep the colnames fix for "x" -> metric;
reference input_df, metric, Run, Precursor, and plot_df when locating where to
add the filter.
- Around line 119-123: The PDF device opened in the block guarded by if
(!identical(address, FALSE)) should be protected with on.exit(dev.off()) so that
the device is closed even if print(p) throws; update the block around
pdf(paste0(address, "QualityMetricsPlot.pdf")) / print(p) / dev.off() to call
on.exit(dev.off(), add = TRUE) immediately after opening the device (before
print(p)) so the device will always be closed, keeping the final dev.off() as a
normal cleanup or removing the redundant one if desired.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c05809a-9219-4f93-b7b5-f7409f82d64b

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb7066 and 064b917.

📒 Files selected for processing (3)
  • NAMESPACE
  • R/plot_quality_metrics.R
  • man/MSstatsQualityMetricsPlot.Rd

Comment thread R/plot_quality_metrics.R
Comment thread R/plot_quality_metrics.R
@tonywu1999 tonywu1999 merged commit 688ff6d into devel Apr 24, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the qc-anomaly branch April 24, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant