diff --git a/vignettes/xcms.Rmd b/vignettes/xcms.Rmd index ba16810b..cf405eca 100644 --- a/vignettes/xcms.Rmd +++ b/vignettes/xcms.Rmd @@ -62,7 +62,10 @@ be applied to the older *MSnbase*-based workflows (xcms version 3). Additional documents and tutorials covering also other topics of untargeted metabolomics analysis are listed at the end of this document. There is also a [xcms tutorial](https://jorainer.github.io/xcmsTutorials) available with more examples -and details. +and details. +To get a complete overview of LCMS-MS analysis, an end-to-end workflow +[Metabonaut website](https://rformassspectrometry.github.io/metabonaut/), which +integrate the *xcms* preprocessing steps with the downstream analysis, is available. # Preprocessing of LC-MS data @@ -1230,6 +1233,52 @@ defined above. The `filter` argument can accommodate various types of input, each determining the specific type of quality assessment and filtering to be performed. +The `PercentMissingFilter` allows to filter features based on the percentage of +missing values for each feature. This function takes as an input the parameter +`f` which is supposed to be a vector of length equal to the length of the object +(i.e. number of samples) with the sample type for each. The function then +computes the percentage of missing values per sample groups and filters +features based on this. Features with a percent of missing values larger than +the threshold in all sample groups will be removed. Another option is to base +this quality assessment and filtering only on QC samples. + +Both examples are shown below: + +```{r} +# To set up parameter `f` to filter only based on QC samples +f <- sampleData(faahko)$sample_type +f[f != "QC"] <- NA + +# To set up parameter `f` to filter per sample type excluding QC samples +f <- sampleData(faahko)$sample_type +f[f == "QC"] <- NA + +missing_filter <- PercentMissingFilter(threshold = 30, + f = f) +# Apply the filter to faakho object +filtered_faahko <- filterFeatures(object = faahko, + filter = missing_filter) + +# Apply the filter to res object +missing_filter <- PercentMissingFilter(threshold = 30, + f = f) +filtered_res <- filterFeatures(object = res, + filter = missing_filter) +``` + +Here, no feature was removed, meaning that all the features had less than 30% +of `NA` values in at least one of the sample type. + +Although not directly relevant to this experiment, the `BlankFlag` filter can be +used to flag features based on the intensity relationship between blank and QC +samples. More information can be found in the documentation of the filter: + +```{r} +# Retrieve documentation for the main function and the specific filter. +?filterFeatures +?BlankFlag +``` + The `RsdFilter` enable users to filter features based on their relative standard deviation (coefficient of variation) for a specified `threshold`. It is recommended to base the computation on quality control (QC) samples, @@ -1238,14 +1287,14 @@ as demonstrated below: ```{r} # Set up parameters for RsdFilter rsd_filter <- RsdFilter(threshold = 0.3, - qcIndex = sampleData(faahko)$sample_type == "QC") + qcIndex = sampleData(filtered_faahko)$sample_type == "QC") # Apply the filter to faakho object -filtered_faahko <- filterFeatures(object = faahko, filter = rsd_filter) +filtered_faahko <- filterFeatures(object = filtered_faahko, filter = rsd_filter) # Now apply the same strategy to the res object -rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = res$sample_type == "QC") -filtered_res <- filterFeatures(object = res, filter = rsd_filter, assay = "raw") +rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = filtered_res$sample_type == "QC") +filtered_res <- filterFeatures(object = filtered_res, filter = rsd_filter, assay = "raw") ``` All features with an RSD (CV) strictly larger than 0.3 in QC samples were thus @@ -1279,64 +1328,6 @@ filtered_res <- filterFeatures(object = filtered_res, All features with an D-ratio strictly larger than 0.5 were thus removed from the data set. -The `PercentMissingFilter` allows to filter features based on the percentage of -missing values for each feature. This function takes as an input the parameter -`f` which is supposed to be a vector of length equal to the length of the object -(i.e. number of samples) with the sample type for each. The function then -computes the percentage of missing values per sample groups and filters -features based on this. Features with a percent of missing values larger than -the threshold in all sample groups will be removed. Another option is to base -this quality assessment and filtering only on QC samples. - -Both examples are shown below: - -```{r} -# To set up parameter `f` to filter only based on QC samples -f <- sampleData(filtered_faakho)$sample_type -f[f != "QC"] <- NA - -# To set up parameter `f` to filter per sample type excluding QC samples -f <- sampleData(filtered_faakho)$sample_type -f[f == "QC"] <- NA - -missing_filter <- PercentMissingFilter(threshold = 30, - f = f) - -# Apply the filter to faakho object -filtered_faakho <- filterFeatures(object = filtered_faakho, - filter = missing_filter) - -# Apply the filter to res object -missing_filter <- PercentMissingFilter(threshold = 30, - f = f) -filtered_res <- filterFeatures(object = filtered_res, - filter = missing_filter) -``` - -Here, no feature was removed, meaning that all the features had less than 30% -of `NA` values in at least one of the sample type. - -Although not directly relevant to this experiment, the `BlankFlag` filter can be -used to flag features based on the intensity relationship between blank and QC -samples. More information can be found in the documentation of the filter: - -```{r} -# Retrieve documentation for the main function and the specific filter. -?filterFeatures -?BlankFlag -``` - -## Normalization - -Normalizing features' signal intensities is required, but at present not (yet) -supported in `xcms` (some methods might be added in near future). It is advised -to use the `SummarizedExperiment` returned by the `quantify()` method for any -further data processing, as this type of object stores feature definitions, -sample annotations as well as feature abundances in the same object. For the -identification of e.g. features with significant different -intensities/abundances it is suggested to use functionality provided in other R -packages, such as Bioconductor's excellent *limma* package. - ## Alignment to an external reference dataset