Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise updates #504

Merged
merged 23 commits into from
Jul 1, 2024
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 295 additions & 17 deletions inst/pages/98_exercises.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ However, such data are not always included.

```{r}
#| label: Other-elements-ex1
#| eval: FALSE
#| eval: TRUE
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
#| code-fold: true
#| code-summary: "Show solution"

Expand Down Expand Up @@ -565,42 +565,168 @@ colData(tse)$Barcode_full_length

### Subsetting

In this exercise, we'll go over the basics of subsetting a TreeSE object.
Keep in mind that it's good practice to always keep the original data intact.
Therefore, we suggest creating a new TreeSE object whenever you subset.

Please have a try at the following exercises.

1. Subset the TreeSE object to specific samples.

```{r}
#| label: subsetting-ex1
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

tse_subset_by_sample <- tse[ , tse$SampleType %in%
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
c("Feces", "Skin", "Tongue")]

unique(tse_subset_by_sample$SampleType)
```

2. Subset the TreeSE object to specific features.

```{r}
#| label: subsetting-ex2
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

tse_subset_by_sample_and_feature <- tse[rowData(tse)$Phylum %in% c("Crenarchaeota",
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
"Euryarchaeota",
"Actinobacteria")
,]

dim(tse_subset_by_sample_and_feature)
```

3. Subset the TreeSE object to specific samples and features.

```{r}
#| label: subsetting-ex3
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

tse_subset_by_feature <- tse[rowData(tse)$Phylum %in% c("Crenarchaeota",
"Euryarchaeota",
"Actinobacteria")
,
tse$SampleType %in%
c("Feces", "Skin", "Tongue")]
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved

unique(rowData(tse_subset_by_feature)$Phylum)
```


### Library sizes

1. Calculate library sizes
2. Subsample / rarify the counts (see: rarefyAssay)

Useful functions: nrow, ncol, dim, summary, table, quantile, unique, scater::addPerCellQC, mia::mergeFeaturesByRank
Useful functions: nrow, ncol, dim, summary, table, quantile, unique, scater::addPerCellQC, mia::agglomerateByRank

### Prevalent and core taxonomic features

1. Estimate prevalence for your chosen feature (row, taxonomic group)
2. Identify all prevalent features and subset the data accordingly
3. Report the thresholds and the dimensionality of the data before and after subsetting
4. Visualize prevalence
1. Estimate prevalence for your chosen feature (specific row and taxonomic group)
```{r}
#| label: prevalence-ex1
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

Useful functions: getPrevalence, getPrevalent, subsetByPrevalent
# First, merge the features by rank and add it as an alternative experiment.
altExp(tse,"Phylum") <- agglomerateByRank(tse, "Phylum")

# Then get the prevalence for a specific sample
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
getPrevalence(altExp(tse,"Phylum"), detection = 1/100,
sort = TRUE, assay.type = "counts",
as_relative = TRUE)["Phylum:Proteobacteria"]
```
2. Identify all prevalent features and subset the data accordingly.

```{r}
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
#| label: prevalence-ex2
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

# Get the prevalent features
prevalentFeatures <- getPrevalentFeatures(tse, detection = 0,
prevalence = 50/100)
# Subset the data to the prevalent features
prevalenceTse <- tse[rowData(tse)$Phylum %in% prevalentFeatures,]

# Alternatively subsetByPrevalentFeatures can be used to subset directly
prevalenceTse <- subsetByPrevalentFeatures(tse, detection = 0,
prevalence = 50/100)
```
TuomasBorman marked this conversation as resolved.
Show resolved Hide resolved
3. How many features are left ?

```{r}
#| label: prevalence-ex3
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

length(assay(prevalenceTse, "counts"))
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
```

4. Recalculate the most prevalent features based on relative abundance.
How many are left?

```{r}
#| label: prevalence-ex4
#| eval: True
#| code-fold: true
#| code-summary: "Show solution"

tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")

prevalenceTse <- subsetByPrevalentFeatures(tse, detection = 0,
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
prevalence = 50/100)

length(assay(prevalenceTse, "relabundance"))
```

5. Visualize prevalence using a violin/bean plot.

```{r}
#| label: prevalence-ex5
#| eval: False
#| code-fold: true
#| code-summary: "Show solution"

# Import the scater library
library(scater)

# Store the prevalence of each taxon
rowData(prevalenceTse)$prevalence <-
getPrevalence(prevalenceTse, detection = 1/100,
sort = FALSE,
assay.type = "counts", as_relative = TRUE)
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved

# Then plot the row data
plotRowData(tse, "prevalence",
colour_by = "Phylum")
```

Useful functions: getPrevalence, getPrevalentFeatures, subsetByPrevalentFeatures

### Data exploration

1. Summarize sample metadata variables. (How many age groups, how they are distributed? 0%, 25%, 50%, 75%, and 100% quantiles of library size?)
2. Create two histograms. Another shows the distribution of absolute counts, another shows how CLR transformed values are distributed.
3. Visualize how relative abundances are distributed between taxa in samples.

Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAssay, ggplot, wilcox.test, mergeFeaturesByRank, miaViz::plotAbundance
Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAssay, ggplot, wilcox.test, agglomerateByRank, miaViz::plotAbundance

### Other functions

1. Merge data objects (merge, mergeSEs)
2. Melt the data for visualization purposes (meltSE)


### Assay transformation

1. Import the mia package, load any of the example data sets mentioned in Chapter 3.3
Expand All @@ -625,7 +751,7 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
with `data` (you need one with taxonomic information at Phylum level)
and store it into a variable named `tse`.
2. List the available taxonomic ranks in the data with `taxonomyRanks`.
3. Agglomerate the data to Phylum level with `mergeFeaturesByRank` and the
3. Agglomerate the data to Phylum level with `agglomerateByRank` and the
appropriate value for `Rank`.
4. Report the dimensions of the TreeSE before and after agglomerating. You can
use `dim` for that.
Expand All @@ -651,6 +777,85 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
As for assays, you can access the desired altExp by name or index.
6. **Extra**: Split the data based on other features with `splitOn`.

### Beta Diversity using Alternative experiments

This Exercise will focus on calcuating dissimilarities or beta diversity using alternative experiments.

1. Import the mia and vegan packages and load a dataset. This can be your own or one of the packages built in to mia.

```{r}
#| label: beta-AltExp-ex1
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

library(mia)
library(vegan)

data("Tengeler2020", package = "mia")
tse <- Tengeler2020
```

2. Calculate relative abundances and create alternative experiments using `splitByRanks` or `splitOn`
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved

```{r}
#| label: beta-AltExp-ex2
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")

altExps(tse) <- splitByRanks(tse)
altExps(tse)
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
```

3. Run PCoA on on one of the created alternative experiments using Bray-Curtis dissimilarity.

```{r}
#| label: beta-AltExp-ex3
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

tse <- runMDS(tse,
FUN = vegan::vegdist,
method = "bray",
assay.type = "relabundance",
name = "MDS_bray")
```

4. Import the scater library, to plot the first two coordinates using the `reducedDim`function and plot the coordinates.

```{r}
#| label: beta-AltExp-ex4
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

p <- plotReducedDim(tse, "MDS_bray")
p
```

5. Add the explained variance ratios for each coordinate on the axes labels and plot the PCoA again.

```{r}
#| label: beta-AltExp-ex5
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

# Calculate explained variance
e <- attr(reducedDim(tse, "MDS_bray"), "eig")
rel_eig <- e / sum(e[e > 0])

# Add explained variance for each axis
p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep = ""),
y = paste("PCoA 2 (", round(100 * rel_eig[[2]], 1), "%", ")", sep = ""))

p
```


## Community (alpha) diversity

Expand All @@ -664,7 +869,7 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
contain the alpha diversity indices?
4. **Extra**: Agglomerate the TreeSE by phylum and compare the mean Shannon
diversity of the original experiment with its agglomerated version. You can
use `mergeFeaturesByRank` to perform agglomeration and `mean` to calculate the
use `agglomerateByRank` to perform agglomeration and `mean` to calculate the
mean values of the respective columns in colData.

### Visualization
Expand Down Expand Up @@ -848,17 +1053,90 @@ the contributions to beta diversity.

### Beta diversity analysis

This exercise prompts you to implement a workflow with distance-based RDA
(dbRDA). You can refer to chapter [@sec-dbrda-workflow] for a step-by-step
In this exercise, we'll ask you to implement a distance-based Redundancy Analysis
(dbRDA) to understand the relationships between microbial community composition and various environmental or experimental factors within your dataset. You can refer to chapter [@sec-dbrda-workflow] for a step-by-step
walkthrough, which may be simplified in the future.

1. Import the mia and vegan packages, load any of the example data sets mentioned in Chapter 3.3
with `data` and store it into a variable named `tse`.
4. Create dbRDA with Bray-Curtis dissimilarities on relative abundances. Use PERMANOVA. Can differences between samples be explained with variables of sample meta data?
5. Analyze diets' association on beta diversity. Calculate dbRDA and then PERMANOVA. Visualize coefficients. Which taxa's abundances differ the most between samples?
6. Interpret your results. Is there association between community composition and location? What are those taxa that differ the most; find information from literature.

```{r}
#| label: dbRDA-ex1
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

library(mia)
library(vegan)

data("GlobalPatterns", package = "mia") # Feel free to use any dataset
tse <- GlobalPatterns
```

2. Create a new assay with the relative abundance for your dataset and merge the
features by a rank of your choice.

```{r}
#| label: dbRDA-ex2
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")

tse <- agglomerateByRank(tse,
rank = "Species")
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
```

3. Select a group of samples and perform a permutational analysis on the calculated relative abundance using Bray-Curtis dissimilarities

```{r}
#| label: dbRDA-ex3
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

tse$Group <- tse$SampleType == "Feces"
```

4. Implement dbRDA on relative abundances and perform another permutational analysis on the redundancy analysis.

```{r}
#| label: dbRDA-ex4
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

dbrda <- dbrda(t(assay(tse,"relabundance")) ~ Group,
data = colData(tse))

permanova2 <- anova.cca(dbrda,
by = "margin",
method = "bray",
permutations = 99)
permanova2
```

The output of the anova holds information on the variation differences between
the chosen groups, in our case w have a p-value of 0.01, indicating that
the groups are statistically different and are unlikely to be due to random
variation

5. Extract the p-values. Can the differences between samples be explained with variables from their metadata?
```{r}
#| label: dbRDA-ex5
#| eval: FALSE
#| code-fold: true
#| code-summary: "Show solution"

p_values <- c(permanova["Group", "Pr(>F)"], permanova2["Group", "Pr(>F)"])

p_values <-as.data.frame(p_values)

rownames(p_values) <- c("adonis2", "dbRDA+anova.cca")
Insaynoah marked this conversation as resolved.
Show resolved Hide resolved
```

Useful functions: scater::runMDS, runRDA, vegan::anova.cca, transformAssay, mergeFeaturesByRank, ggplot, scater::plotReducedDim, vegan::adonis2
Useful functions: scater::runMDS, runRDA, vegan::anova.cca, transformAssay, agglomerateByRank, ggplot, scater::plotReducedDim, vegan::adonis2


## Differential abundance
Expand Down
Loading