-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathCaseStudy_Sandu2020.Rmd
258 lines (184 loc) · 13.1 KB
/
CaseStudy_Sandu2020.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
---
title: Diversity of CD8 T cells in chronic infection for six different tissues
date: "`r Sys.Date()`"
author: "M. Andreatta and S. Carmona"
output:
rmdformats::readthedown:
self-contained: true
highlight: haddock
thumbnails: false
css: styles.css
knit: (function(input_file, encoding) {
out_dir <- 'docs';
rmarkdown::render(input_file, encoding=encoding,
output_file=file.path(dirname(input_file), out_dir, 'Sandu_LCMV.html'))})
---
```{r setup, echo=FALSE, warning=FALSE}
#install.packages("rmdformats")
#Template markdown setup
library(knitr)
library(rmdformats)
library(formatR)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
cache.lazy=FALSE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE,
dev='png')
opts_knit$set(width=75)
```
# Background
During chronic infection and cancer, prolonged antigen stimulation leads T cells to progressively lose effector function, a process often called **"T cell exhaustion".** The lymphocytic choriomeningitis virus (LCMV) model is one of the best-studied model systems of viral infection in mouse, and has been instrumental in elucidating the biology of T cell exhaustion. In the context of chronic LCMV infection, most studies focus on virus-specific T cells from spleen. The recent study by [Sandu et al.](https://www.sciencedirect.com/science/article/pii/S2211124720310639) applied single-cell sequencing to study CD8 T cell diversity in six different tissues (spleen, blood, bone marrow, lymph node, liver and lung), to determine how the tissue microenvironment affects and shapes T cell phenotypes. They found that T cells adopt tissue-specific transcriptomics profiles, with differential expression of specific genes in different organs, e.g. in genes related to antigen encounter and TCR activation.
In this case study, we applied **ProjecTILs** to re-analyze the single-cell data from Sandu et al. and study tissue-specific T cell heterogeneity in the context of a LCMV reference atlas. Raw data are available from the [European Nucleotide Archive](https://www.ebi.ac.uk/ena/browser/view/PRJEB36998); for convenience, we will start the analysis from the 10x gene-expression matrices available at [Sandu_CellReports2020.tar.gz](https://figshare.com/ndownloader/files/47904319), kindly provided by the authors.
The reference atlas of virus-specific CD8 T cells used in this case study can be explored interactively through the SPICA portal at: [https://spica.unil.ch/refs/viral-CD8-T](https://spica.unil.ch/refs/viral-CD8-T)
# R Environment
Check & load R packages
```{r message=F, warning=F, results=F, cache=F}
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
library(renv)
renv::restore()
library(ggplot2)
library(patchwork)
library(reshape2)
library(Seurat)
library(ProjecTILs)
```
# Download and process single-cell data
First download the gene expression data: [Sandu_CellReports2020.tar.gz](https://figshare.com/ndownloader/files/47904319), and *untar* the file to a convenient location (e.g. in the *input* folder).
The folder P14_tissues contains single-cell expression matrices for 6 tissues (blood, spleen, bone marrow, lymph node, lung and liver). They will be stored in a single Seurat object for further processing.
```{r}
ddir <- "input/P14_tissues"
tissues <- c("Blood","Spleen","BM","LN","Lung","Liver")
folders <- c("1_blood","2_spleen","3_BM","4_LN","5_Lung","6_Liver")
query.list <- list()
for (i in seq_along(folders)) {
mpath <- sprintf("%s/%s",ddir,folders[i])
query.list[[i]] <- read.sc.query(mpath, type="10x", min.cells = 3, min.features = 50)
query.list[[i]] <- RenameCells(query.list[[i]], add.cell.id = paste0("S",i))
query.list[[i]]$Tissue <- tissues[i]
}
query.merged <- Reduce(merge, query.list)
#For Seurat 5 objects
query.merged <- JoinLayers(query.merged)
table(query.merged$Tissue)
```
# Basic quality checks
Run some basic QC for ribosomal and mitochondrial gene content, and minimum number of genes and UMIs.
```{r stats1}
percent.ribo.dv <- PercentageFeatureSet(query.merged, pattern = "^Rp[ls]")
percent.mito.dv <- PercentageFeatureSet(query.merged, pattern = "^mt-")
query.merged <- AddMetaData(query.merged, metadata = percent.ribo.dv, col.name = "percent.ribo")
query.merged <- AddMetaData(query.merged, metadata = percent.mito.dv, col.name = "percent.mito")
```
```{r stats2}
Idents(query.merged) <- "Tissue"
VlnPlot(query.merged, features = c("nFeature_RNA","nCount_RNA","percent.ribo","percent.mito"),
ncol = 2, pt.size=0.001, split.by = "Tissue") & theme(axis.text.x = element_blank())
```
Filter outlier cells
```{r stats.filter}
query.merged <- subset(query.merged, subset = nFeature_RNA>500 & nFeature_RNA<4000 &
nCount_RNA>1000 & nCount_RNA<15000 &
percent.ribo < 50 & percent.mito < 5)
dim(query.merged)
```
# ProjecTILs analysis
Download the CD8 T cell atlas for infection from figshare: [LCMV reference atlas](https://ndownloader.figshare.com/files/23166794)
Save the *ref_LCMV_Atlas_mouse_v1.rds* object in your working directory, and the load it into memory to inspect it:
```{r load.reference}
ref <- load.reference.map("ref_LCMV_Atlas_mouse_v1.rds")
ref@images <- list()
DefaultAssay(ref) <- "integrated"
# reproduce cluster colors
library(scales)
functional.cluster.colors <- hue_pal()(7)
functional.cluster.colors <- functional.cluster.colors[c(4,3,2,5,1,6,7)]
names(functional.cluster.colors) <- levels(ref$functional.cluster)
DimPlot(ref, reduction = "umap", label = TRUE, pt.size = 0.5,
group.by = "functional.cluster", dims = c(2,1),
cols = functional.cluster.colors) + NoLegend() + theme(aspect.ratio = 1) +
scale_x_reverse() + scale_y_reverse() + ggtitle("Reference CD8 LCMV atlas")
```
An interactive version of the reference map can also be explored at [https://spica.unil.ch/refs/viral-CD8-T](https://spica.unil.ch/refs/viral-CD8-T)
**Optional:** to speed up computation (and if our machine has large enough memory), we can run projection in parallel for individual samples, by setting the parameter `ncores` to the number of parallel processes.
```{r message=F, warning=F, results=F}
#For example, to run on 6 computing cores:
ncores <- 6
```
Now we have loaded the query samples in a Seurat object, and an atlas object to be used as reference. The following commands will split the query data by tissue, and project each sample independently onto the reference T cell atlas.
```{r message=F, warning=F, results=F}
querydata <- SplitObject(query.merged, split.by = "Tissue")
#For serial processing, just set ncores=1
query.projected <- Run.ProjecTILs(querydata, ref=ref, filter.cells = F, ncores=ncores)
```
## Tissue-specific T cell state composition
We can inspect how the cells from different tissues distribute across the map.
```{r cache=FALSE}
pll <- list()
for (i in seq_along(query.projected)) {
s <- names(query.projected)[i]
pll[[i]] <- plot.projection(ref, query.projected[[s]],
pointsize = 0.3, linesize = 0.3,
cols=functional.cluster.colors) +
theme_bw() + theme(aspect.ratio = 0.8) + scale_x_reverse() +
scale_y_reverse() + coord_flip() + NoLegend() + ggtitle(s)
}
wrap_plots(pll, ncol=3)
```
We can already surmise that cells from different tissues occupy diverse areas of the reference atlas. More quantitavely, we can calculate the fraction of cells assigned into each T cell subtype for different tissues:
```{r fig.height=5, fig.width=5, cache=FALSE}
#Reorder colors (to resemble order in Sandu Figure 2J)
cols_use <- functional.cluster.colors[c(3,1,4,2,7,6,5)]
states_all <- levels(factor(names(functional.cluster.colors), levels = names(cols_use)))
m <- matrix(nrow=length(names(query.projected)), ncol = length(states_all))
rownames(m) <- names(query.projected)
colnames(m) <- states_all
for (i in seq_along(query.projected)) {
tb <- table(factor(query.projected[[i]]$functional.cluster, levels = states_all))
m[i,] <- tb * 100/sum(tb)
}
melt <- reshape2::melt(m)
colnames(melt) <- c("Tissue", "Cell_state","Percent")
p <- ggplot(melt, aes(x = Tissue, y = Percent, fill = Cell_state)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = functional.cluster.colors) +
theme_light() + theme(legend.position = "right")
p
```
Note that *Tex* cells constitute the majority of T cells for all tissues, as expected in this infection model. However, different tissues were composed of variable fractions of other T cell subtypes. For instance, lung, blood and spleen had the highest percentage of effector cells (SLEC), while lymphnode and spleen had an exceeding percentage of Tpex cells compared to other tissues. Compare this chart with Figure 2J from the [original paper](https://www.sciencedirect.com/science/article/pii/S2211124720310639). While Sandu et al. defined fewer T cell states compared to our reference atlas for infection, the tissue-specific composition for the main T cell subtypes showed a remarkable correspondence between the ProjecTILs prediction and the original, unsupervised analysis.
## Gene expression
To confirm that the projection to the reference states was accurate, we can also inspect the gene expression profiles of T cells from different tissues with respect to the reference profiles.
```{r fig.height=8, fig.width=11, cache=FALSE}
genes4radar = c("Cd8a","Tcf7","Ccr7","Gzmb","Slamf6","Pdcd1","Havcr2","Tox","Entpd1","Cx3cr1","Cxcr6","Xcl1","Mki67")
g <- plot.states.radar(ref, query=query.projected, min.cells = 200, genes4radar = genes4radar, return=T)
plot(g)
```
## Discriminant gene analysis
The projection analysis showed that different tissues are composed of different proportions of broad T cell states. However, we may also ask whether there is variability between cells *within* specific reference T cell states. For instance, do effector (SLEC) cells from blood differentially express genes compared to spleen SLEC cells? do terminally exhausted (Tex) cells from liver differ from Tex cells from spleen?
We can answer these questions by performing subtype-specific differential expression analysis using the `find.discriminat.genes` function of ProjecTILs, specifying a pairs of tissues and a reference cell state. The library `EnhancedVolcano` is recommended to visualize the differential expression results:
```{r cache=FALSE}
#BiocManager::install("EnhancedVolcano")
library(EnhancedVolcano)
deg <- find.discriminant.genes(ref, query.projected[["Blood"]], query.control=query.projected[["Spleen"]],
state="SLEC", min.pct = 0.1, logfc.threshold = 0.1, query.assay = "RNA")
EnhancedVolcano(deg, lab = rownames(deg),x = 'avg_log2FC', y = 'p_val', pCutoff = 10e-10, FCcutoff = 0.25, labSize = 5,
legendPosition = 'none', drawConnectors = F,
title = 'Blood vs. Spleen (SLEC)')
deg2 <- find.discriminant.genes(ref, query.projected[["Liver"]], query.control=query.projected[["Spleen"]], state="Tex", min.pct = 0.1, logfc.threshold = 0.1)
EnhancedVolcano(deg2, lab = rownames(deg2),x = 'avg_log2FC', y = 'p_val', pCutoff = 10e-10, FCcutoff = 0.25, labSize = 5,
legendPosition = 'none', drawConnectors = F,
title = 'Liver vs. Spleen (Tex)')
```
Differential expression between blood and spleen in that SLEC cells in spleen overexpress markers of activation like Nfkbia, Nr4a1 and Cd69, indicating that these cells may have recently encountered antigen, unlike circulating cells. A similar observation can be made for Tex cells from liver and spleen, but in this case also a significant overexpression of Gzma in liver is observed, as also noted in the original study.
# Conclusions
In the study by Sandu et al., the authors described the heterogeneity of CD8 T cells across multiple tissues, using a “traditional” approach based on unsupervised clustering, classification and differential expression. Definition of cell clusters, including considerations about batch-effects vs. tissue-related biological differences, the annotation of meaningful cell types, the differential expression analyses to determine inter-subtype differences as well as inter-tissue differences, all required an enormous amount of curation and expertise about the system under study.
With this case study, we showed how ProjecTILs can lead to very similar results with minimal effort and domain expertise. We found that tissue-specific composition of T cell subtypes predicted by ProjecTILs correlates very well with the subtypes defined by the original study in an unsupervised way, and could detect specific genes and gene modules associated with specific tissues and T cell subtypes.
# Further reading
Original publication - [Sandu et al. (2020) Cell Reports](https://www.sciencedirect.com/science/article/pii/S2211124720310639)
ProjecTILs case studies - [INDEX](https://carmonalab.github.io/ProjecTILs_CaseStudies/) - [Repository](https://github.com/carmonalab/ProjecTILs_CaseStudies)
The ProjecTILs method [Andreatta et. al (2021) Nat. Comm.](https://www.nature.com/articles/s41467-021-23324-4) and [code](https://github.com/carmonalab/ProjecTILs)