Skip to content

refineChromPeaks,MergeNeighboringPeaksParam fails to merge nested peaks #825

Description

@jorainer

With MergeNeighboringPeaksParam, refineChromPeaks() should merge partially or completely overlapping peaks. In the current implementation it can however fail to merge the latter, depending on the order of the peaks in chromPeaks():

## Define overlapping/nested chrom peaks 
pks <- rbind(
    c(666.0693, 666.0693, 666.0693, 31.779, 27.59, 35.968, 1, 1, 1, 1, 1),
    c(666.0713, 666.0683, 666.0747, 31.181, 27.59, 39.559, 2, 2, 2, 2, 1),
    c(666.0693, 666.0693, 666.0693, 31.779, 27.59, 36.968, 3, 3, 3, 3, 1))
colnames(pks) <- c("mz", "mzmin", "mzmax", "rt", "rtmin", "rtmax", "into",
                   "intb", "maxo", "sn", "sample")
rownames(pks) <- c("A", "B", "C")

In this example, the first ("A") and third ("C") peak are completely within the m/z and rt range of the second ("B"). Peak merging should therefore only report the second, but not the first or third. The internal function to merge peaks is xcms:::.merge_neighboring_peak_candidates().

## define the remaining data required for the function
pkd <- data.frame(ms_level = rep(1L, 3), is_filled = rep(FALSE, 3))

x <- list(cbind(mz = c(), intensity = c()),
          cbind(mz = c(), intensity = c()),
          cbind(mz = c(), intensity = c()))
rt <- c(30.5, 31.5, 32.5)

Running this function with these data results in

> xcms:::.merge_neighboring_peak_candidates(x, rt, pks, pkd)
$chromPeaks
        mz    mzmin    mzmax     rt rtmin  rtmax into intb maxo sn sample
A 666.0693 666.0693 666.0693 31.779 27.59 35.968    1    1    1  1      1
B 666.0713 666.0683 666.0747 31.181 27.59 39.559    2    2    2  2      1

$chromPeakData
  ms_level is_filled
1        1     FALSE
2        1     FALSE

so, both "A" and "B" are reported, although "A" is completely within "B".

We get the expected results if we change the order of the peaks at input into B, A, C:

> xcms:::.merge_neighboring_peak_candidates(x, rt, pks[c(2, 1, 3), ], pkd)
$chromPeaks
        mz    mzmin    mzmax     rt rtmin  rtmax into intb maxo sn sample
B 666.0713 666.0683 666.0747 31.181 27.59 39.559    2    2    2  2      1

$chromPeakData
  ms_level is_filled
1        1     FALSE

The function iterates over the peaks, and if a peak is completely within the rtrange of the another only the bigger is retained. For that the peaks need to be ordered in a way that for overlapping peaks, larger peaks are ordered first. The function does order the peaks, but only by "rtmin": idx <- order(pks[, "rtmin"]). In cases like our example where the "rtmin" is the same, they are processed in the original order. peak B is not completely within peak A and therefore peak A is retained. Peak C is completely within peak B and therefore only peak B is reported.

To address this, the peaks should not only be ordered by "rtmin", but also considering "rtmax", such that in cases where peaks have the same "rtmin", the bigger peak (with a larger "rtmax") comes first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions