-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy path03-prerequisites.Rmd
315 lines (245 loc) · 13.1 KB
/
03-prerequisites.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
# Prerequisites {#prerequisites}
The analysis presented in this book requires a basic understanding of the
`R` programing language. An introduction to `R` can be found [here](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) and
in the book [R for Data Science](https://r4ds.hadley.nz/).
Furthermore, it is beneficial to be familiar with single-cell data analysis
using the [Bioconductor](https://www.bioconductor.org/) framework. The
[Orchestrating Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/release/OSCA/) book
gives an excellent overview on data containers and basic analysis that are being
used here.
An overview on IMC as technology and necessary image processing steps can be
found on the [IMC workflow website](https://bodenmillergroup.github.io/IMCWorkflow/).
Before we get started on IMC data analysis, we will need to make sure that
software dependencies are installed and the example data is downloaded.
## Obtain the code
This book provides R code to perform single-cell and spatial data analysis.
You can copy the individual code chunks into your R scripts or you can obtain
the full code of the book via:
```
git clone https://github.com/BodenmillerGroup/IMCDataAnalysis.git
```
## Software requirements
The R packages needed to execute the presented workflow can either be manually
installed (see section \@ref(manual-install)) or are available within a provided
Docker container (see section \@ref(docker)). The Docker option is useful if you
want to exactly reproduce the presented analysis across operating systems;
however, the manual install gives you more flexibility for exploratory data
analysis.
### Using Docker {#docker}
For reproducibility purposes, we provide a Docker container [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/pkgs/container/imcdataanalysis).
1. After installing [Docker](https://docs.docker.com/get-docker/) you can first pull the container via:
```
docker pull ghcr.io/bodenmillergroup/imcdataanalysis:latest
```
and then run the container:
```
docker run -v /path/to/IMCDataAnalysis:/home/rstudio/IMCDataAnalysis \
-e PASSWORD=bioc -p 8787:8787 \
ghcr.io/bodenmillergroup/imcdataanalysis:latest
```
Here, the `/path/to/` needs to be adjusted to where you keep the code and data
of the book.
**Of note: it is recommended to use a date-tagged version of the container to ensure reproducibility**.
This can be done via:
```
docker pull ghcr.io/bodenmillergroup/imcdataanalysis:<year-month-date>
```
2. An RStudio server session can be accessed via a browser at `localhost:8787` using `Username: rstudio` and `Password: bioc`.
3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file.
4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`.
### Manual installation {#manual-install}
The following section describes how to manually install all needed R packages
when not using the provided Docker container.
To install all R packages needed for the analysis, please run:
```{r install-packages, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("rmarkdown", "bookdown", "pheatmap", "viridis", "zoo",
"devtools", "testthat", "tiff", "distill", "ggrepel",
"patchwork", "mclust", "RColorBrewer", "uwot", "Rtsne",
"harmony", "Seurat", "SeuratObject", "cowplot", "kohonen",
"caret", "randomForest", "ggridges", "cowplot",
"gridGraphics", "scales", "tiff", "harmony", "Matrix",
"CATALYST", "scuttle", "scater", "dittoSeq",
"tidyverse", "BiocStyle", "batchelor", "bluster", "scran",
"lisaClust", "spicyR", "iSEE", "imcRtools", "cytomapper",
"imcdatasets", "cytoviewer"))
# Github dependencies
devtools::install_github("i-cyto/Rphenograph")
```
```{r load-libraries, echo = FALSE, message = FALSE}
options(timeout=10000)
library(CATALYST)
library(SpatialExperiment)
library(SingleCellExperiment)
library(scuttle)
library(scater)
library(imcRtools)
library(cytomapper)
library(dittoSeq)
library(tidyverse)
library(bluster)
library(scran)
library(lisaClust)
library(caret)
library(cytoviewer)
```
### Major package versions
Throughout the analysis, we rely on different R software packages.
This section lists the most commonly used packages in this workflow.
Data containers:
* [SpatialExperiment](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) version `r packageVersion("SpatialExperiment")`
* [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) version `r packageVersion("SingleCellExperiment")`
Data analysis:
* [CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html) version `r packageVersion("CATALYST")`
* [imcRtools](https://bioconductor.org/packages/release/bioc/html/imcRtools.html) version `r packageVersion("imcRtools")`
* [scuttle](https://bioconductor.org/packages/release/bioc/html/scuttle.html) version `r packageVersion("scuttle")`
* [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) version `r packageVersion("scater")`
* [batchelor](https://www.bioconductor.org/packages/release/bioc/html/batchelor.html) version `r packageVersion("batchelor")`
* [bluster](https://www.bioconductor.org/packages/release/bioc/html/bluster.html) version `r packageVersion("bluster")`
* [scran](https://www.bioconductor.org/packages/release/bioc/html/scran.html) version `r packageVersion("scran")`
* [harmony](https://github.com/immunogenomics/harmony) version `r packageVersion("harmony")`
* [Seurat](https://satijalab.org/seurat/index.html) version `r packageVersion("Seurat")`
* [lisaClust](https://www.bioconductor.org/packages/release/bioc/html/lisaClust.html) version `r packageVersion("lisaClust")`
* [caret](https://topepo.github.io/caret/) version `r packageVersion("caret")`
Data visualization:
* [cytomapper](https://bioconductor.org/packages/release/bioc/html/cytomapper.html) version `r packageVersion("cytomapper")`
* [cytoviewer](https://bioconductor.org/packages/release/bioc/html/cytoviewer.html) version `r packageVersion("cytoviewer")`
* [dittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html) version `r packageVersion("dittoSeq")`
Tidy R:
* [tidyverse](https://www.tidyverse.org/) version `r packageVersion("tidyverse")`
## Image processing {#image-processing}
The analysis presented here fully relies on packages written in the programming
language `R` and primarily focuses on analysis approaches downstream of image
processing. The example data available at
[https://zenodo.org/record/7575859](https://zenodo.org/record/7575859) were
processed (file type conversion, image segmentation, feature extraction as
explained in Section \@ref(processing)) using the
[steinbock](https://bodenmillergroup.github.io/steinbock/latest/) toolkit. The
exact command line interface calls to process the raw data are shown below:
```{r, echo = FALSE, message = FALSE}
if (!dir.exists("data/steinbock")) dir.create("data/steinbock")
if (!dir.exists("data/ImcSegmentationPipeline")) dir.create("data/ImcSegmentationPipeline")
# Pre-download steinbock file
download.file("https://zenodo.org/record/7624451/files/steinbock.sh",
"data/steinbock/steinbock.sh")
```
```{bash, file="data/steinbock/steinbock.sh", eval=FALSE}
```
## Download example data {#download-data}
Throughout this tutorial, we will access a number of different data types.
To declutter the analysis scripts, we will already download all needed data here.
To highlight the basic steps of IMC data analysis, we provide example data that
were acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive
**CAN**cer patient cohorts projects ([immucan.eu](https://immucan.eu/)). The
raw data of 4 patients can be accessed online at
[zenodo.org/record/7575859](https://zenodo.org/record/7575859). We will only
download the sample/patient metadata information here:
```{r download-sample-data}
download.file("https://zenodo.org/record/7575859/files/sample_metadata.csv",
destfile = "data/sample_metadata.csv")
```
### Processed multiplexed imaging data
The IMC raw data was either processed using the
[steinbock](https://github.com/BodenmillerGroup/steinbock) toolkit or the
[IMC Segmentation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline).
Image processing included file type conversion, cell segmentation and feature
extraction.
**steinbock output**
This book uses the output of the `steinbock` framework when applied to process
the example data. The processed data includes the single-cell mean intensity
files, the single-cell morphological features and spatial locations, spatial
object graphs in form of edge lists indicating cells in close proximity, hot
pixel filtered multi-channel images, segmentation masks, image metadata and
channel metadata. All these files will be downloaded here for later use. The
commands which were used to generate this data can be found in the shell script
above.
```{r steinbock-results}
# download intensities
url <- "https://zenodo.org/record/7624451/files/intensities.zip"
destfile <- "data/steinbock/intensities.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download regionprops
url <- "https://zenodo.org/record/7624451/files/regionprops.zip"
destfile <- "data/steinbock/regionprops.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download neighbors
url <- "https://zenodo.org/record/7624451/files/neighbors.zip"
destfile <- "data/steinbock/neighbors.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download images
url <- "https://zenodo.org/record/7624451/files/img.zip"
destfile <- "data/steinbock/img.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download masks
url <- "https://zenodo.org/record/7624451/files/masks_deepcell.zip"
destfile <- "data/steinbock/masks_deepcell.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download individual files
download.file("https://zenodo.org/record/7624451/files/panel.csv",
"data/steinbock/panel.csv")
download.file("https://zenodo.org/record/7624451/files/images.csv",
"data/steinbock/images.csv")
download.file("https://zenodo.org/record/7624451/files/steinbock.sh",
"data/steinbock/steinbock.sh")
```
**IMC Segmentation Pipeline output**
The example data was also processed using the
[IMC Segmetation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline) (version 3).
To highlight the use of the reader function for this type of output, we will need
to download the `cpout` folder which is part of the `analysis` folder. The `cpout`
folder stores all relevant output files of the pipeline. For a full description
of the pipeline, please refer to the [docs](https://bodenmillergroup.github.io/ImcSegmentationPipeline/).
```{r imcsegpipe-results}
# download analysis folder
url <- "https://zenodo.org/record/7997296/files/analysis.zip"
destfile <- "data/ImcSegmentationPipeline/analysis.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/ImcSegmentationPipeline", overwrite=TRUE)
unlink(destfile)
unlink("data/ImcSegmentationPipeline/analysis/cpinp/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/crops/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/histocat/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ilastik/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ometiff/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/images/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/probabilities/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/masks/", recursive=TRUE)
```
### Files for spillover matrix estimation
To highlight the estimation and correction of channel-spillover as described by
[@Chevrier2017], we can access an example spillover-acquisition from:
```{r download-spillover-data}
download.file("https://zenodo.org/record/7575859/files/compensation.zip",
"data/compensation.zip")
unzip("data/compensation.zip", exdir="data", overwrite=TRUE)
unlink("data/compensation.zip")
```
### Gated cells
In Section \@ref(classification), we present a cell type classification approach
that relies on previously gated cells. This ground truth data is available
online at [zenodo.org/record/8095133](https://zenodo.org/record/8095133) and
will be downloaded here for later use:
```{r download-gated-cells}
download.file("https://zenodo.org/record/8095133/files/gated_cells.zip",
"data/gated_cells.zip")
unzip("data/gated_cells.zip", exdir="data", overwrite=TRUE)
unlink("data/gated_cells.zip")
```
## Software versions {#sessionInfo}
<details>
<summary>SessionInfo</summary>
```{r, echo = FALSE, message = FALSE}
sessionInfo()
```
</details>