-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmatch-standards-introduction.Rmd
111 lines (93 loc) · 5.67 KB
/
match-standards-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# Introduction
Mixes of standards have been solved in water or added to human serum sample
pools in two different concentration and these samples were measured with the
LC-MS setup from Eurac (used also to generate the CHRIS untargeted metabolomics
data). Assignment of retention time and observed ion can be performed for each
standard based on MS1 information such as expected m/z but also based on the
expected difference in signal between samples with low and high concentration of
the standards. Finally, experimental fragment spectra provide the third level of
evidence. Thus, the present data set allows to annotate features to standards
based on 3 levels of evidence: expected m/z, difference in measured intensity
and MS2-based annotation.
The goal of this analysis is to create an in-house reference library for
our untargeted LC-MS setup. For that we:
- determine the retention time of each standard in water and serum.
- define which ions/adducts are measured for each standard (with relative
abundances, i.e. which is most highly abundant ion etc).
- extract all MS2 spectra for each standard, match them against reference
spectra and store them in a database.
The approach consists of the following steps:
- **Identification of potential features for a certain standard**. Features
matching pre-defined adducts are identified. These are further sub-setted
keeping only features which match the expected difference in intensities
between *high* and *low* concentration samples. Resulting features are in
addition grouped based in similar retention time (5 seconds) and similar
abundance across samples (correlation > 0.6).
- **MS2 spectra of all selected features are matched against reference spectra
for the standard from HMDB**. If no match is found and the ion is different
than `[M+H]+` or `[M-H]-` a comparison between neutral loss spectra is
performed in addition.
- **MS2 spectra of all selected features are matched against all HMDB and
Massbank**. This is to evaluate specificity of the fragment spectra.
Based on the available information retention times/features are annotated to
standards using different confidence levels:
- **D** (lowest confidence): based on evidence 1: signal needs to be
higher in samples with higher concentration of the standard. Measured m/z has
to match the m/z of an ion of the compound. A visual inspection of the EIC has
to be done in addition to ensure high quality of the signal.
- **C**: based on evidence 1 and 2. In addition to **D**, also the MS2
spectrum for that feature needs to match with at least a low similarity to the
reference spectra for that compound from HMDB.
- **B**: same as **C** but similarity to a reference spectrum for the standard
is higher than 0.7.
- **A** (highest confidence): same as **B**, but the experimental MS2 spectrum
does not match a MS2 spectrum from any other compound in HMDB or MassBank with
a similarity > 0.7. Matches against spectra with a difference in their
precursor m/z are not considered.
These confidence levels are for the annotation of the feature(s) to the
standard. Features with similar retention times and abundances across samples
(as well as eventually similar peak shapes) *inherit* the highest confidence
level from other features but get assigned a **-** to the inherited
confidence. **D** confidence levels are only given if there is a single feature
or for features with retention times similar to higher confidence features from
the other polarity or other matrix.
## Generation of the reference database
The lab-internal reference database will then be created based on the manually
defined and selected features and MS2 spectra. Feature data will be added as
*ion* information to the database. Features detected in water samples will be
preferred, but if a certain ion was only detected in serum or if the retention
times between serum and water features differs by more than 10 seconds they will
be in addition also added. All (manually selected) MS2 spectra (whether in water
or in serum) will be added.
Information from features stored in the database are:
- `original_sample`: the name of the sample mix.
- `sample_matrix`: either water or serum pool.
- `ion_adduct`: the adduct definition.
- `relative_ion_intensity`: the relative intensity of the ion (relative to all
other ions of the same compound in the same sample matrix).
- `polarity`: the polarity; negative or positive.
- `compound_id`: the (HMDB) ID of the compound.
- `ion_mz`: the (measured) m/z of the feature (`mzmed`).
- `ion_rt`: the rounded, measured retention time of the feature (`rtmed`).
- `confidence_level`: the manually defined confidence level.
- `feature_id`: the feature ID.
Information from MS2 spectra stored on the database are:
- `original_file`: the name of the mzML file. This information should allow to
link the spectrum to the sample mix etc.
- `adduct`: the adduct definition.
- `compound_id`: the (HMDB) ID of the compound.
- `acquisitionNum`: the spectrum ID/index within the mzML file.
- `polarity`: the polarity.
- `rtime`: the actual retention time of the spectrum.
- `precursorMz`: the precursor m/z.
- `collision_energy`: the collision energy.
- `instrument`: the MS instrument (`"Sciex TripleTOF 5600+"`).
- `instrument_type`: the setup (`"LC-ESI-QTOF"`).
- `confidence`: either `"low"` (no match against reference spectrum) or `"high"`
(matches reference spectrum).
Checklist for adding features and MS2 spectra:
- Check water and serum data: which ions where found? What is the retention
time?
- Prefer adding features from water, unless in serum more ions were present.
- If water and serum features have large difference in retention time, add both.
- Add all MS2 spectra, from water and serum, from all ions.