-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProjecTILs.demo.Rmd
232 lines (169 loc) · 9.74 KB
/
ProjecTILs.demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: Projecting scRNA-seq data onto a reference map
date: "`r Sys.Date()`"
author: "M. Andreatta and S. Carmona"
output:
rmdformats::readthedown:
self-contained: true
highlight: haddock
thumbnails: false
css: styles.css
knit: (function(input_file, encoding) {
out_dir <- 'docs';
rmarkdown::render(input_file, encoding=encoding,
output_file=file.path(dirname(input_file), out_dir, 'tutorial.html'))})
---
```{r setup, echo=FALSE}
#install.packages("rmdformats")
#Template markdown setup
library(knitr)
library(rmdformats)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
cache.lazy=FALSE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE,
dev='png')
opts_knit$set(width=75)
```
**ProjecTILs** is a computational method to project scRNA-seq data into reference single-cell atlases, enabling their direct comparison in a stable, annotated system of coordinates. This tutorial outlines the main functions implemented in ProjecTILs on a small, simple dataset. For more advanced (and more biologically interesting) applications of ProjecTILs see this list of [ProjecTILs case studies](https://carmonalab.github.io/ProjecTILs_CaseStudies/)
# R environment
First, check package dependencies and install ProjecTILs
```{r message=F, warning=F}
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
options(timeout = max(300, getOption("timeout")))
if (!requireNamespace("renv"))
install.packages("renv")
library(renv)
renv::restore()
#remotes::install_github("carmonalab/STACAS")
#remotes::install_github("carmonalab/ProjecTILs")
library(ProjecTILs)
library(Seurat)
```
# Load reference atlas and query data
First, load a mouse reference TIL atlas. Several reference maps are available [from GitHub](https://github.com/carmonalab/ProjecTILs) pr from the [SPICA website](https://spica.unil.ch/refs).
If no reference map file is provided, the function `load.reference.map()` will automatically download it from [https://doi.org/10.6084/m9.figshare.12478571](https://doi.org/10.6084/m9.figshare.12478571)
```{r}
ref <- load.reference.map()
```
Let's explore the reference atlas
```{r fig.height=4, fig.width=6}
refCols <- c("#edbe2a", "#A58AFF", "#53B400", "#F8766D", "#00B6EB", "#d1cfcc", "#FF0000", "#87f6a5", "#e812dd")
DimPlot(ref,label = T, cols = refCols)
```
See expression of important marker genes across reference subtypes
```{r fig.height=6, fig.width=4}
markers <- c("Cd4","Cd8a","Ccr7","Tcf7","Pdcd1","Havcr2",
"Tox","Izumo1r","Cxcr6","Xcl1","Gzmb","Gzmk","Ifng","Foxp3")
VlnPlot(ref,features=markers,stack = T, flip = T, fill.by = "ident",
cols = refCols, assay = "RNA") + NoLegend()
```
Now let's load a query dataset - [Miller et al., Nature Immunol (2019)](https://pubmed.ncbi.nlm.nih.gov/30778252/)
```{r warning=FALSE}
#A sample data set is provided with the ProjecTILs package
querydata <- ProjecTILs::query_example_seurat
```
More generally, it is possible to load a query matrix with gene names and barcodes (e.g. 10X format or raw counts)
```{r warning=FALSE,message=FALSE,results=FALSE}
##Raw count matrix from GEO
library(GEOquery)
geo_acc <- "GSE86028"
getGEOSuppFiles(geo_acc)
fname2 <- sprintf("%s/GSE86028_TILs_sc_wt_mtko.tpm.log2.txt.gz", geo_acc)
querydata2 <- read.sc.query(fname2, type = "raw.log2")
```
# Run Projection algorithm
The main function in ProjecTILs is `Run.ProjecTILs`, which takes as input a reference map and query dataset. The query will be batch-corrected and projected into the reference map, with low-dimensional embeddings (PCA and UMAP) compatible with those of the reference.
```{r warning=FALSE,message=FALSE}
query.projected <- Run.ProjecTILs(querydata, ref=ref)
```
**NB:** by default, `Run.ProjecTILs()` will pre-filter T cells using [scGate](https://github.com/carmonalab/scgate). In case the input dataset is already pre-filtered, you can disable this step using `Run.ProjecTILs(querydata, ref=ref, filter.cells = FALSE)`. If you are using a custom reference map that is not composed of T cells, you may specify a different scGate filter using the `scGate_model` parameter.
# Visualize projection
Plot projection of new data over the reference in UMAP space. The contour lines display the density of projected query cells onto the reference map.
```{r fig.height=4, fig.width=6}
plot.projection(ref, query.projected, linesize = 0.5, pointsize = 0.5)
```
Plot the predicted composition of the query in terms of reference T cell subtypes
```{r fig.height=3, fig.width=5}
plot.statepred.composition(ref, query.projected,metric = "Percent")
```
# Compare gene expression
How do the gene expression levels compare between reference and query for the different cell states?
```{r fig.height=8, fig.width=12, warning=FALSE,message=FALSE}
genes4radar = c("Foxp3","Cd4","Cd8a","Tcf7","Ccr7","Sell",
"Gzmb","Gzmk","Pdcd1","Havcr2", "Tox", "Mki67")
plot.states.radar(ref, query=query.projected, genes4radar = genes4radar, min.cells=20)
```
# Compare gene programs
We may want to compare query and reference for gene programs, rather than individual genes. For example, we can use signatures stored in the [SignatuR](https://github.com/carmonalab/SignatuR) database and calculate average signature scores per subtype.
```{r}
remotes::install_github("carmonalab/SignatuR")
library(SignatuR)
programs <- GetSignature(SignatuR$Mm$Programs)
names(programs)
```
We can obtain per-cell scores using [UCell](https://bioconductor.org/packages/release/bioc/html/UCell.html), and then generate radar plots on average signature scores per subtype:
```{r fig.height=10, fig.width=15, warning=FALSE,message=FALSE}
library(UCell)
ref <- AddModuleScore_UCell(ref, features=programs, assay = "RNA", name = NULL)
query.projected <- AddModuleScore_UCell(query.projected, features=programs, assay = "RNA", name=NULL)
plot.states.radar(ref, query=query.projected, meta4radar = names(programs))
```
# Compare cell states across conditions
If we have multiple conditions (e.g. control vs. treatment, or samples from different tissues), we can search for discriminant genes between conditions (otherwise, by default this analysis is performed using the reference as the 'control')
```{r fig.height=8, fig.width=12}
#Simulate a condition which e.g. increases Gzmb expression compared to control
query.control <- subset(query.projected, subset=`Gzmb` < 1.5)
query.perturb <- subset(query.projected, subset=`Gzmb` >= 1.5)
plot.states.radar(ref, query=list("Control" = query.control, "Query" = query.perturb))
```
In this toy example, where we simulated a condition that increases Gzmb expression compared to control, we expect cytotoxicity genes to drive differences.
```{r warning=FALSE,message=FALSE}
discriminantGenes <- find.discriminant.genes(ref=ref, query=query.perturb, query.control=query.control, state="CD8_Tex")
head(discriminantGenes,n=10)
```
We can use a volcano plot to display differentially expressed genes:
```{r fig.height=5, fig.width=6}
library(EnhancedVolcano)
EnhancedVolcano(discriminantGenes, lab = rownames(discriminantGenes), x = "avg_log2FC", y = "p_val", pCutoff = 1e-09,
FCcutoff = 0.5, labSize = 5, legendPosition = "none", drawConnectors = F, title = "Gzmb_high vs. Gzmb_low (Tex)")
```
Using a random subsetting, p-values should not be significant:
```{r fig.height=5, fig.width=6}
rand.list <- ProjecTILs:::randomSplit(query.projected, n=2, seed=1)
discriminantGenes <- find.discriminant.genes(ref=ref,
query=rand.list[[1]],
query.control=rand.list[[2]],
logfc.threshold = 0.01,
state="CD8_Tex")
EnhancedVolcano(discriminantGenes, lab = rownames(discriminantGenes), x = "avg_log2FC", y = "p_val", pCutoff = 1e-09,
FCcutoff = 0.5, labSize = 5, legendPosition = "none", drawConnectors = F, title = "Random split (Tex)")
```
# Using ProjecTILs as a classifier
If you do not wish to embed your query data into the reference space, you may also simply use ProjecTILs as a cell type classifier. This may be useful to annotate cell types in your query without altering existing embeddings.
See the query dataset in unsupervised low-dim embeddings:
```{r fig.height=3.5, fig.width=7}
querydata <- querydata |> FindVariableFeatures(nfeatures=500) |>
ScaleData() |> RunPCA(npcs=10) |> RunUMAP(dims=1:10)
DimPlot(querydata)
```
The `ProjecTILs.classifier` function applies reference-projection but does not alter the current embeddings.
```{r fig.height=3.5, fig.width=7}
querydata <- ProjecTILs.classifier(query=querydata, ref=ref)
palette <- c("#edbe2a", "#A58AFF", "#53B400", "#F8766D",
"#00B6EB", "#d1cfcc", "#FF0000", "#87f6a5", "#e812dd", "#777777" )
names(palette) <- c(levels(ref$functional.cluster), 'NA')
DimPlot(querydata, group.by="functional.cluster", cols = palette)
```
We can confirm that most of the cells were classified as CD8_Tex. Please note that filtered cells (i.e. those that were removed by the scGate filter) are assigned the NA label, as they correspond to cell types that are not present in the reference.
# Further reading
For applications of ProjecTILs to gain biological insight on several public datasets please see the **ProjecTILs case studies** - [INDEX](https://carmonalab.github.io/ProjecTILs_CaseStudies/) - [Repository](https://github.com/carmonalab/ProjecTILs_CaseStudies)
To generate your own reference map for ProjecTILs see [Custom reference map tutorial](https://carmonalab.github.io/ProjecTILs.demo/build_ref_atlas.html)
Publication: [Andreatta et al Nat. Comm. 2021](http://dx.doi.org/10.1038/s41467-021-23324-4)
ProjecTILs [repository](https://github.com/carmonalab/ProjecTILs)