Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chromPeakSummary function and a fix #772

Open
wants to merge 24 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2c57df3
feat: report m/z values in chromPeaks matrix for XChromatograms
jorainer Sep 16, 2024
dacd665
fix: peak shape quality calculation for gap filling
jorainer Sep 18, 2024
188541c
feat: add chromPeakSummary generic (issue #705)
jorainer Sep 18, 2024
e5aa3ff
docs: add documentation for chromPeakSummary generic
jorainer Sep 18, 2024
b59b15e
Add internal function to calculate beta metrics
pablovgd Sep 18, 2024
c017cba
Fixed requested changes for PR #767
pablovgd Sep 19, 2024
fe7509c
Fix for PR #767
pablovgd Sep 19, 2024
670de04
Merge pull request #767 from pablovgd/issue705
jorainer Sep 19, 2024
c581cde
added chromPeakSummary method
pablovgd Sep 20, 2024
f9d4b0d
requested changes for PR #768
pablovgd Sep 20, 2024
e7ee120
Merge pull request #768 from pablovgd/issue705
jorainer Sep 23, 2024
dd68e2f
Added section on peak quality to vignette.
pablovgd Sep 23, 2024
a49539e
Fixed typos in peak quality vignette.
pablovgd Sep 23, 2024
13cfaf3
Merge pull request #770 from pablovgd/issue705
jorainer Sep 24, 2024
156788a
refactor: little fixes
jorainer Sep 24, 2024
fd9cf65
Address William's comments
jorainer Oct 1, 2024
39cffa7
Merge branch 'devel' into jomain
jorainer Oct 8, 2024
a3bc33b
bump version
jorainer Oct 8, 2024
9f1b7b6
Merge pull request #776 from sneumann/phili
jorainer Oct 29, 2024
93d8b2d
Merge branch 'devel' into jomain
jorainer Oct 30, 2024
97b781e
Merge remote-tracking branch 'origin/jomain' into jomain
jorainer Oct 30, 2024
46ca5ae
Update NEWS.md
jorainer Oct 30, 2024
2d658b6
Merge branch 'devel' into jomain
jorainer Dec 16, 2024
6a67604
feat: add c and coerce methods for XcmsExperiment
jorainer Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: xcms
Version: 4.3.3
Version: 4.3.4
Title: LC-MS and GC-MS Data Analysis
Description: Framework for processing and visualization of chromatographically
separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF,
Expand Down
6 changes: 4 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,8 @@ export("CentWaveParam",
"CleanPeaksParam",
"MergeNeighboringPeaksParam",
"FilterIntensityParam",
"ChromPeakAreaParam")
"ChromPeakAreaParam",
"BetaDistributionParam")
## Param class methods.

## New Classes
Expand Down Expand Up @@ -530,7 +531,8 @@ exportMethods("hasChromPeaks",
"featureSpectra",
"chromPeakSpectra",
"chromPeakChromatograms",
"featureChromatograms"
"featureChromatograms",
"chromPeakSummary"
)

## feature grouping functions and methods.
Expand Down
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# xcms 4.3

## Changes in version 4.3.4

- Address issue #765: peak detection on chromatographic data: report a
chromatogram's `"mz"`, `"mzmin"` and `"mzmax"` as the mean m/z and lower and
upper m/z in the `chromPeaks()` matrix.
- Fix calculation of the correlation coefficient for peak shape similarity with
an idealized bell shape (*beta*) during gap filling for centWave-based
chromatographic peak detection with parameter `verboseBetaColumns = TRUE`.
- Add `chromPeakSummary` generic (issue #705).
- Add `chromPeakSummary()` method to calculate the *beta* quality metrics.


## Changes in version 4.3.3

- Fix issue #755: `chromatogram()` with `msLevel = 2` fails to extract
Expand Down
59 changes: 59 additions & 0 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,65 @@ setGeneric("chromPeakData<-", function(object, value)
setGeneric("chromPeakSpectra", function(object, ...)
standardGeneric("chromPeakSpectra"))

#' @title Chromatographic peak summaries
#'
#' @name chromPeakSummary
#'
#' @description
#'
#' The `chromPeakSummary()` method calculates summary statistics or other
#' metrics for each of the identified chromatographic peaks in an *xcms* result
#' object, such as the [XcmsExperiment()]. Different metrics can be calculated,
#' depending upon (and configured by) using dedicated *parameter* classes. As a
#' result, the method returns a `matrix` or `data.frame` with one row per
#' chromatographic peak. Each column contains calculated values, depending on
#' the used method/parameter class.
#'
#' Currently implemented methods/parameter classes are:
#'
#' - `BetaDistributionParam`: calculates the *beta_cor* and *beta_snr* quality
#' metrics as described in Kumler 2023 representing the result from a
#' (correlation) test of similarity (using Pearson's correlation coefficient)
#' to a bell curve and the signal-to-noise ratio calculated on the residuals
#' of this test.
#'
#' @param BPPARAM Parallel processing setup. See [bpparam()] for details.
#'
#' @param chunkSize `integer(1)` defining the number of samples from which data
#' should be loaded and processed at a time.
#'
#' @param msLevel `integer(1)` with the MS level of the chromatographic peaks
#' on which the metric should be calculated.
#'
#' @param object an *xcms* result object containing information on
#' identified chromatographic peaks.
#'
#' @param param a parameter object defining the method/summaries that should
#' be calculated (see description above for supported parameter classes).
#'
#' @param ... additional arguments passed to the method implementation.
#'
#' @return
#'
#' A `matrix` or `data.frame` with the same number of rows as there are
#' chromatographic peaks. Columns contain the calculated values. The number of
#' columns, their names and content depend on the used parameter object. See
#' the respective documentation above for more details.
#'
#' @author Pablo Vangeenderhuysen, Johannes Rainer, William Kumler
#'
#' @md
#'
#' @references
#'
#' Kumler W, Hazelton B J and Ingalls A E (2023) "Picky with peakpicking:
#' assessing chromatographic peak quality with simple metrics in metabolomics"
#' *BMC Bioinformatics* 24(1):404. doi: 10.1186/s12859-023-05533-4
#'
#' @export
setGeneric("chromPeakSummary", function(object, param, ...)
standardGeneric("chromPeakSummary"))

setGeneric("collect", function(object, ...) standardGeneric("collect"))
setGeneric("consecMissedLimit", function(object, ...)
standardGeneric("consecMissedLimit"))
Expand Down
5 changes: 5 additions & 0 deletions R/DataClasses.R
Original file line number Diff line number Diff line change
Expand Up @@ -2182,3 +2182,8 @@ setClass("FilterIntensityParam",
msg
else TRUE
})

setClass("BetaDistributionParam",
contains = "Param"
)

46 changes: 43 additions & 3 deletions R/XcmsExperiment-functions.R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.

Original file line number Diff line number Diff line change
Expand Up @@ -520,8 +520,9 @@
## consider adding 0 or NA intensity for those.
mat <- do.call(rbind, xsub)
if (nrow(mat)) {
nr <- vapply(xsub, nrow, NA_integer_)
## can have 0, 1 or x values per rt; repeat rt accordingly
rts <- rep(rt[keep], vapply(xsub, nrow, integer(1L)))
rts <- rep(rt[keep], nr)
maxi <- which.max(mat[, 2L])[1L]
mmz <- do.call(mzCenterFun, list(mat[, 1L], mat[, 2L]))
if (is.na(mmz)) mmz <- mat[maxi, 1L]
Expand All @@ -530,15 +531,54 @@
sum(mat[, 2L], na.rm = TRUE) *
((rtr[2L] - rtr[1L]) / max(1L, (length(keep) - 1L)))
)
if ("beta_cor" %in% cn)
if ("beta_cor" %in% cn) {
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
mat[, 2L], rts)
vapply(xsub[nr > 0], function(z) sum(z[, "intensity"]),
NA_real_),
rt[keep][nr > 0])
}
}
}
}
res[!is.na(res[, "maxo"]), , drop = FALSE]
}


#' Calculates quality metrics for a chromatographic peak.
#'
#' @param x `list` of peak matrices (from a single MS level and from a single
#' file/sample).
#'
#' @param rt retention time for each peak matrix.
#'
#' @param peakArea `matrix` defining the chrom peak area.
#'
#' @author Pablo Vangeenderhuysen
#'
#' @noRd
.chrom_peak_beta_metrics <- function(x, rt, peakArea, ...) {
res <- matrix(NA_real_, ncol = 2L, nrow = nrow(peakArea))
rownames(res) <- rownames(peakArea)
colnames(res) <- c("beta_cor","beta_snr")
for (i in seq_len(nrow(res))) {
rtr <- peakArea[i, c("rtmin", "rtmax")]
keep <- which(between(rt, rtr))
if (length(keep)) {
xsub <- lapply(x[keep], .pmat_filter_mz,
mzr = peakArea[i, c("mzmin", "mzmax")])
nr <- vapply(xsub, nrow, NA_integer_)
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
vapply(xsub[nr > 0], function(z) sum(z[, "intensity"]), NA_real_),
rt[keep][nr > 0])
}
}
res
}





#' Difference to the original code is that the weighted mean is also calculated
#' if some of the peak intensities in the profile matrix are 0
#'
Expand Down
36 changes: 36 additions & 0 deletions R/XcmsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -2013,3 +2013,39 @@ setMethod(
object[i = sort(unique(file)), keepAdjustedRtime = keepAdjustedRtime,
keepFeatures = keepFeatures, ...]
})

#' @rdname chromPeakSummary
setMethod(
"chromPeakSummary",
signature(object = "XcmsExperiment", param = "BetaDistributionParam"),
function(object, param, msLevel = 1L, chunkSize = 2L, BPPARAM = bpparam()) {
if (length(msLevel) != 1)
stop("Can only perform peak metrics for one MS level at a time.")
if (!hasChromPeaks(object, msLevel = msLevel))
stop("No ChromPeaks definitions for MS level ", msLevel, " present.")
## Define region to calculate metrics from for each file
cp <- chromPeaks(object, msLevel = msLevel)
f <- factor(cp[,"sample"], seq_along(object))
pal <- split.data.frame(cp[, c("mzmin", "mzmax", "rtmin", "rtmax")], f)
names(pal) <- seq_along(pal)
## Manual chunk processi ng because we have to split `object` and `pal`
idx <- seq_along(object)
chunks <- split(idx, ceiling(idx / chunkSize))
pb <- progress_bar$new(format = paste0("[:bar] :current/:",
"total (:percent) in ",
":elapsed"),
total = length(chunks) + 1L, clear = FALSE)
pb$tick(0)
# mzf <- "wMean"
res <- lapply(chunks, function(z, ...) {
pb$tick()
.xmse_integrate_chrom_peaks(
.subset_xcms_experiment(object, i = z, keepAdjustedRtime = TRUE,
ignoreHistory = TRUE),
pal = pal[z], intFun = .chrom_peak_beta_metrics,
msLevel = msLevel, BPPARAM = BPPARAM)
})
res <- do.call(rbind, res)
pb$tick()
res
})
29 changes: 18 additions & 11 deletions R/do_findChromPeaks-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -3720,14 +3720,20 @@ peaksWithCentWave <- function(int, rt,
#' requires at least 5 scans or it will return NA for both parameters.
#'
#' @param intensity A numeric vector corresponding to the peak intensities
#'
#' @param rtime A numeric vector corresponding to the retention times of each
#' intensity. If not provided, intensities will be assumed to be equally spaced.
#' intensity. Retention times are expected to be in increasing order
#' without duplicates. If not provided, intensities will be assumed to be
#' equally spaced.
#'
#' @param skews A numeric vector of the skews to try, corresponding to the
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
#' right-skewed, while values greater than 5 will be left-skewed.
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs() tells you the skewedness (regardless in what direction) and you can <0 or >0 if you only want the direction ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thank you for the review. I'm using the language inherent to the dbeta() function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.

#' increasingly right-skewed, while values greater than 5 will be
#' left-skewed.
#'
#' @param zero.rm Boolean value controlling whether "missing" scans are dropped
#' prior to curve fitting. The default, TRUE, will remove intensities of zero
#' or NA
#' prior to curve fitting. The default, TRUE, will remove intensities of
#' zero or NA
#'
#' @author William Kumler
#'
Expand All @@ -3740,21 +3746,22 @@ peaksWithCentWave <- function(int, rt,
rtime <- rtime[keep]
intensity <- intensity[keep]
}
if(length(intensity)<5){
if (length(intensity) < 5) {
best_cor <- NA
beta_snr <- NA
} else {
beta_sequence <- rep(.scale_zero_one(rtime), each=length(skews))
beta_sequence <- rep(.scale_zero_one(rtime), each = length(skews))
beta_vals <- t(matrix(dbeta(beta_sequence, shape1 = skews, shape2 = 5),
nrow = length(skews)))
# matplot(beta_vals)
beta_cors <- cor(intensity, beta_vals)
best_cor <- max(beta_cors)
best_curve <- beta_vals[, which.max(beta_cors)]
noise_level <- sd(diff(.scale_zero_one(best_curve)-.scale_zero_one(intensity)))
beta_snr <- log10(max(intensity)/noise_level)
noise_level <- sd(diff(.scale_zero_one(best_curve) -
.scale_zero_one(intensity)))
beta_snr <- log10(max(intensity) / noise_level)
}
c(best_cor=best_cor, beta_snr=beta_snr)
c(best_cor = best_cor, beta_snr = beta_snr)
}


Expand All @@ -3769,5 +3776,5 @@ peaksWithCentWave <- function(int, rt,
#'
#' @noRd
.scale_zero_one <- function(num_vec){
(num_vec-min(num_vec))/(max(num_vec)-min(num_vec))
(num_vec-min(num_vec)) / (max(num_vec) - min(num_vec))
}
5 changes: 5 additions & 0 deletions R/functions-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,8 @@ FilterIntensityParam <- function(threshold = 0, nValues = 1L, value = "maxo") {
new("FilterIntensityParam", threshold = as.numeric(threshold),
nValues = as.integer(nValues), value = value)
}

#' @rdname chromPeakSummary
BetaDistributionParam <- function() {
new("BetaDistributionParam")
}
12 changes: 6 additions & 6 deletions R/functions-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
valsPerSpect = valsPerSpect, rtrange = rtr,
mzrange = mzr)
if (length(mtx)) {
## mtx: time, mz, intensity
if (any(!is.na(mtx[, 3]))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity). Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - generally. Here, we load the mtx matrix with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.

I did some timings on that:

mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L

library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
               expr min    lq   mean median    uq  max neval cld
          mtx[, 3L] 354 363.5 437.30  380.5 477.5  948   100   a
 mtx[, "intensity"] 389 404.0 543.07  412.0 474.0 5036   100   a
     mtx[, int_col] 381 399.5 469.45  407.0 484.0 1135   100   a

## How to calculate the area: (1)sum of all intensities / (2)by
## the number of data points (REAL ones, considering also NAs)
Expand All @@ -290,21 +291,20 @@ dropGenericProcessHistory <- function(x, fun) {
## as e.g. centWave. Using max(1, ... to avoid getting Inf in
## case the signal is based on a single data point.
## (3) rtr[2] - rtr[1]
res[i, "into"] <- sum(mtx[, 3], na.rm = TRUE) *
res[i, "into"] <- sum(mtx[, 3L], na.rm = TRUE) *
((rtr[2] - rtr[1]) /
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))
maxi <- which.max(mtx[, 3])
maxi <- which.max(mtx[, 3L])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, above was not the only hardcoded 3 :-) and more below ...

res[i, c("rt", "maxo")] <- mtx[maxi[1], c(1, 3)]
res[i, c("rtmin", "rtmax")] <- rtr
## Calculate the intensity weighted mean mz
meanMz <- do.call(mzCenterFun, list(mtx[, 2], mtx[, 3]))
if (is.na(meanMz)) meanMz <- mtx[maxi[1], 2]
res[i, "mz"] <- meanMz

if ("beta_cor" %in% cn) {
if ("beta_cor" %in% cn)
res[i, c("beta_cor", "beta_snr")] <- .get_beta_values(
mtx[, 3L], mtx[, 1L])
}
vapply(split(mtx[, 3L], mtx[, 1L]), sum, NA_real_),
unique(mtx[, 1L]))
} else {
res[i, ] <- rep(NA_real_, ncols)
}
Expand Down
24 changes: 18 additions & 6 deletions R/methods-Chromatogram.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@
#' @return
#'
#' If called on a `Chromatogram` object, the method returns an [XChromatogram]
#' object with the identified peaks. See [peaksWithCentWave()] for details on
#' the peak matrix content.
#' object with the identified peaks. Columns `"mz"`, `"mzmin"` and `"mzmax"` in
#' the `chromPeaks()` peak matrix provide the mean m/z and the maximum and
#' minimum m/z value of the `Chromatogram` object. See [peaksWithCentWave()]
#' for details on the remaining columns.
#'
#' @seealso [peaksWithCentWave()] for the downstream function and [centWave]
#' for details on the method.
Expand Down Expand Up @@ -70,10 +72,18 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
rt = rtime(object)),
as(param, "list")))
object <- as(object, "XChromatogram")
chromPeaks(object) <- res
chromPeaks(object) <- .add_mz(res, object@mz)
object
})

.add_mz <- function(x, mz = c(NA_real_, NA_real_)) {
nx <- nrow(x)
tmp <- cbind(mz = rep(mean(mz), nx),
mzmin = rep(mz[1L], nx),
mzmax = rep(mz[2L], nx))
cbind(tmp, x)
}

#' @title matchedFilter-based peak detection in purely chromatographic data
#'
#' @description
Expand All @@ -97,8 +107,10 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
#' @return
#'
#' If called on a `Chromatogram` object, the method returns a `matrix` with
#' the identified peaks. See [peaksWithMatchedFilter()] for details on the
#' matrix content.
#' the identified peaks. Columns `"mz"`, `"mzmin"` and `"mzmax"` in
#' the `chromPeaks()` peak matrix provide the mean m/z and the maximum and
#' minimum m/z value of the `Chromatogram` object. See
#' [peaksWithMatchedFilter()] for details on the remaining columns.
#'
#' @seealso [peaksWithMatchedFilter()] for the downstream function and
#' [matchedFilter] for details on the method.
Expand Down Expand Up @@ -134,7 +146,7 @@ setMethod("findChromPeaks", signature(object = "Chromatogram",
rt = rtime(object)),
as(param, "list")))
object <- as(object, "XChromatogram")
chromPeaks(object) <- res
chromPeaks(object) <- .add_mz(res, object@mz)
object
})

Expand Down
Loading
Loading