Skip to content

Commit

Permalink
Merge pull request #776 from sneumann/phili
Browse files Browse the repository at this point in the history
Update vignette
  • Loading branch information
jorainer authored Oct 29, 2024
2 parents a3bc33b + 23dd061 commit 9f1b7b6
Showing 1 changed file with 54 additions and 63 deletions.
117 changes: 54 additions & 63 deletions vignettes/xcms.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ be applied to the older *MSnbase*-based workflows (xcms version 3). Additional
documents and tutorials covering also other topics of untargeted metabolomics
analysis are listed at the end of this document. There is also a [xcms
tutorial](https://jorainer.github.io/xcmsTutorials) available with more examples
and details.
and details.
To get a complete overview of LCMS-MS analysis, an end-to-end workflow
[Metabonaut website](https://rformassspectrometry.github.io/metabonaut/), which
integrate the *xcms* preprocessing steps with the downstream analysis, is available.


# Preprocessing of LC-MS data
Expand Down Expand Up @@ -1230,6 +1233,52 @@ defined above. The `filter` argument can accommodate various types of input,
each determining the specific type of quality assessment and filtering to be
performed.

The `PercentMissingFilter` allows to filter features based on the percentage of
missing values for each feature. This function takes as an input the parameter
`f` which is supposed to be a vector of length equal to the length of the object
(i.e. number of samples) with the sample type for each. The function then
computes the percentage of missing values per sample groups and filters
features based on this. Features with a percent of missing values larger than
the threshold in all sample groups will be removed. Another option is to base
this quality assessment and filtering only on QC samples.

Both examples are shown below:

```{r}
# To set up parameter `f` to filter only based on QC samples
f <- sampleData(faahko)$sample_type
f[f != "QC"] <- NA
# To set up parameter `f` to filter per sample type excluding QC samples
f <- sampleData(faahko)$sample_type
f[f == "QC"] <- NA
missing_filter <- PercentMissingFilter(threshold = 30,
f = f)
# Apply the filter to faakho object
filtered_faahko <- filterFeatures(object = faahko,
filter = missing_filter)
# Apply the filter to res object
missing_filter <- PercentMissingFilter(threshold = 30,
f = f)
filtered_res <- filterFeatures(object = res,
filter = missing_filter)
```

Here, no feature was removed, meaning that all the features had less than 30%
of `NA` values in at least one of the sample type.

Although not directly relevant to this experiment, the `BlankFlag` filter can be
used to flag features based on the intensity relationship between blank and QC
samples. More information can be found in the documentation of the filter:

```{r}
# Retrieve documentation for the main function and the specific filter.
?filterFeatures
?BlankFlag
```

The `RsdFilter` enable users to filter features based on their relative
standard deviation (coefficient of variation) for a specified `threshold`. It
is recommended to base the computation on quality control (QC) samples,
Expand All @@ -1238,14 +1287,14 @@ as demonstrated below:
```{r}
# Set up parameters for RsdFilter
rsd_filter <- RsdFilter(threshold = 0.3,
qcIndex = sampleData(faahko)$sample_type == "QC")
qcIndex = sampleData(filtered_faahko)$sample_type == "QC")
# Apply the filter to faakho object
filtered_faahko <- filterFeatures(object = faahko, filter = rsd_filter)
filtered_faahko <- filterFeatures(object = filtered_faahko, filter = rsd_filter)
# Now apply the same strategy to the res object
rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = res$sample_type == "QC")
filtered_res <- filterFeatures(object = res, filter = rsd_filter, assay = "raw")
rsd_filter <- RsdFilter(threshold = 0.3, qcIndex = filtered_res$sample_type == "QC")
filtered_res <- filterFeatures(object = filtered_res, filter = rsd_filter, assay = "raw")
```

All features with an RSD (CV) strictly larger than 0.3 in QC samples were thus
Expand Down Expand Up @@ -1279,64 +1328,6 @@ filtered_res <- filterFeatures(object = filtered_res,
All features with an D-ratio strictly larger than 0.5 were thus removed from
the data set.

The `PercentMissingFilter` allows to filter features based on the percentage of
missing values for each feature. This function takes as an input the parameter
`f` which is supposed to be a vector of length equal to the length of the object
(i.e. number of samples) with the sample type for each. The function then
computes the percentage of missing values per sample groups and filters
features based on this. Features with a percent of missing values larger than
the threshold in all sample groups will be removed. Another option is to base
this quality assessment and filtering only on QC samples.

Both examples are shown below:

```{r}
# To set up parameter `f` to filter only based on QC samples
f <- sampleData(filtered_faakho)$sample_type
f[f != "QC"] <- NA
# To set up parameter `f` to filter per sample type excluding QC samples
f <- sampleData(filtered_faakho)$sample_type
f[f == "QC"] <- NA
missing_filter <- PercentMissingFilter(threshold = 30,
f = f)
# Apply the filter to faakho object
filtered_faakho <- filterFeatures(object = filtered_faakho,
filter = missing_filter)
# Apply the filter to res object
missing_filter <- PercentMissingFilter(threshold = 30,
f = f)
filtered_res <- filterFeatures(object = filtered_res,
filter = missing_filter)
```

Here, no feature was removed, meaning that all the features had less than 30%
of `NA` values in at least one of the sample type.

Although not directly relevant to this experiment, the `BlankFlag` filter can be
used to flag features based on the intensity relationship between blank and QC
samples. More information can be found in the documentation of the filter:

```{r}
# Retrieve documentation for the main function and the specific filter.
?filterFeatures
?BlankFlag
```

## Normalization

Normalizing features' signal intensities is required, but at present not (yet)
supported in `xcms` (some methods might be added in near future). It is advised
to use the `SummarizedExperiment` returned by the `quantify()` method for any
further data processing, as this type of object stores feature definitions,
sample annotations as well as feature abundances in the same object. For the
identification of e.g. features with significant different
intensities/abundances it is suggested to use functionality provided in other R
packages, such as Bioconductor's excellent *limma* package.


## Alignment to an external reference dataset

Expand Down

0 comments on commit 9f1b7b6

Please sign in to comment.