microbiome · TuomasBorman · Jul 1, 2024 · May 30, 2024 · May 30, 2024 · May 30, 2024
diff --git a/inst/pages/98_exercises.qmd b/inst/pages/98_exercises.qmd
@@ -508,7 +508,7 @@ However, such data are not always included.
 
 ```{r}
 #| label: Other-elements-ex1
-#| eval: FALSE
+#| eval: TRUE
 #| code-fold: true
 #| code-summary: "Show solution"
 
@@ -565,42 +565,168 @@ colData(tse)$Barcode_full_length
 
 ### Subsetting
 
+In this exercise, we'll go over the basics of subsetting a TreeSE object.
+Keep in mind that it's good practice to always keep the original data intact.
+Therefore, we suggest creating a new TreeSE object whenever you subset.
+
+Please have a try at the following exercises. 
+
 1. Subset the TreeSE object to specific samples.  
+
+```{r}
+#| label: subsetting-ex1
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse_subset_by_sample <- tse[ , tse$SampleType %in% 
+                               c("Feces", "Skin", "Tongue")]
+
+unique(tse_subset_by_sample$SampleType)
+```   
+
 2. Subset the TreeSE object to specific features.  
+
+```{r}
+#| label: subsetting-ex2
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse_subset_by_sample_and_feature <- tse[rowData(tse)$Phylum %in% c("Crenarchaeota", 
+                                                       "Euryarchaeota",
+                                                       "Actinobacteria")
+                             ,]
+
+dim(tse_subset_by_sample_and_feature)
+```   
+
 3. Subset the TreeSE object to specific samples and features.  
 
+```{r}
+#| label: subsetting-ex3
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse_subset_by_feature <- tse[rowData(tse)$Phylum %in% c("Crenarchaeota", 
+                                                       "Euryarchaeota",
+                                                       "Actinobacteria")
+                             ,
+                             tse$SampleType %in% 
+                               c("Feces", "Skin", "Tongue")]
+
+unique(rowData(tse_subset_by_feature)$Phylum)
+```   
 
 
 ### Library sizes
 
 1. Calculate library sizes  
 2. Subsample / rarify the counts (see: rarefyAssay)
 
-Useful functions: nrow, ncol, dim, summary, table, quantile, unique, scater::addPerCellQC, mia::mergeFeaturesByRank
+Useful functions: nrow, ncol, dim, summary, table, quantile, unique, scater::addPerCellQC, mia::agglomerateByRank
 
 ### Prevalent and core taxonomic features
 
-1. Estimate prevalence for your chosen feature (row, taxonomic group)  
-2. Identify all prevalent features and subset the data accordingly   
-3. Report the thresholds and the dimensionality of the data before and after subsetting  
-4. Visualize prevalence  
+1. Estimate prevalence for your chosen feature (specific row and taxonomic group)
+```{r}
+#| label: prevalence-ex1
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
 
-Useful functions: getPrevalence, getPrevalent, subsetByPrevalent
+# First, merge the features by rank and add it as an alternative experiment.
+altExp(tse,"Phylum") <- agglomerateByRank(tse, "Phylum") 
+
+# Then get the prevalence for a specific sample 
+getPrevalence(altExp(tse,"Phylum"), detection = 1/100, 
+                   sort = TRUE, assay.type = "counts",
+                   as_relative = TRUE)["Phylum:Proteobacteria"]
+```   
+2. Identify all prevalent features and subset the data accordingly.
+
+```{r}
+#| label: prevalence-ex2
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+# Get the prevalent features
+prevalentFeatures <- getPrevalentFeatures(tse, detection = 0, 
+                                          prevalence = 50/100)
+# Subset the data to the prevalent features
+prevalenceTse <- tse[rowData(tse)$Phylum %in% prevalentFeatures,]
+
+# Alternatively subsetByPrevalentFeatures can be used to subset directly
+prevalenceTse <- subsetByPrevalentFeatures(tse, detection = 0, 
+                                          prevalence = 50/100)
+```  
+3. How many features are left ?
+
+```{r}
+#| label: prevalence-ex3
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+length(assay(prevalenceTse, "counts"))
+```  
+
+4. Recalculate the most prevalent features based on relative abundance. 
+How many are left?
+
+```{r}
+#| label: prevalence-ex4
+#| eval: True
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")
+
+prevalenceTse <- subsetByPrevalentFeatures(tse, detection = 0, 
+                                           prevalence = 50/100)
+
+length(assay(prevalenceTse, "relabundance"))
+```  
+
+5. Visualize prevalence using a violin/bean plot.
+
+```{r}
+#| label: prevalence-ex5
+#| eval: False
+#| code-fold: true
+#| code-summary: "Show solution"
+
+# Import the scater library
+library(scater)
+
+# Store the prevalence of each taxon
+rowData(prevalenceTse)$prevalence <- 
+    getPrevalence(prevalenceTse, detection = 1/100, 
+                  sort = FALSE,
+                  assay.type = "counts", as_relative = TRUE)
+
+# Then plot the row data
+plotRowData(tse, "prevalence", 
+            colour_by = "Phylum")
+```   
+
+Useful functions: getPrevalence, getPrevalentFeatures, subsetByPrevalentFeatures
 
 ### Data exploration
 
 1. Summarize sample metadata variables. (How many age groups, how they are distributed? 0%, 25%, 50%, 75%, and 100% quantiles of library size?)  
 2. Create two histograms. Another shows the distribution of absolute counts, another shows how CLR transformed values are distributed.  
 3. Visualize how relative abundances are distributed between taxa in samples.  
 
-Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAssay, ggplot, wilcox.test, mergeFeaturesByRank, miaViz::plotAbundance
+Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAssay, ggplot, wilcox.test, agglomerateByRank, miaViz::plotAbundance
 
 ### Other functions
 
 1. Merge data objects (merge, mergeSEs)  
 2. Melt the data for visualization purposes (meltSE)  
 
-
 ### Assay transformation
 
 1. Import the mia package, load any of the example data sets mentioned in Chapter 3.3
@@ -625,7 +751,7 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
    with `data` (you need one with taxonomic information at Phylum level)
    and store it into a variable named `tse`.  
 2. List the available taxonomic ranks in the data with `taxonomyRanks`.  
-3. Agglomerate the data to Phylum level with `mergeFeaturesByRank` and the
+3. Agglomerate the data to Phylum level with `agglomerateByRank` and the
    appropriate value for `Rank`.  
 4. Report the dimensions of the TreeSE before and after agglomerating. You can
    use `dim` for that.  
@@ -651,6 +777,85 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
    As for assays, you can access the desired altExp by name or index.  
 6. **Extra**: Split the data based on other features with `splitOn`.  
 
+### Beta Diversity using Alternative experiments
+
+This Exercise will focus on calcuating dissimilarities or beta diversity using alternative experiments.
+
+1. Import the mia and vegan packages and load a dataset. This can be your own or one of the packages built in to mia.
+
+```{r}
+#| label: beta-AltExp-ex1
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+library(mia)
+library(vegan)
+
+data("Tengeler2020", package = "mia")
+tse <- Tengeler2020
+```
+
+2. Calculate relative abundances and create alternative experiments using `splitByRanks` or `splitOn`
+
+```{r}
+#| label: beta-AltExp-ex2
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")
+
+altExps(tse) <- splitByRanks(tse)
+altExps(tse)
+```
+
+3. Run PCoA on on one of the created alternative experiments using Bray-Curtis dissimilarity.
+
+```{r}
+#| label: beta-AltExp-ex3
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse <- runMDS(tse,
+              FUN = vegan::vegdist,
+              method = "bray",
+              assay.type = "relabundance",
+              name = "MDS_bray")
+```
+
+4. Import the scater library, to plot the first two coordinates using the `reducedDim`function and plot the coordinates.
+
+```{r}
+#| label: beta-AltExp-ex4
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+p <- plotReducedDim(tse, "MDS_bray")
+p
+```
+
+5. Add the explained variance ratios for each coordinate on the axes labels and plot the PCoA again.
+
+```{r}
+#| label: beta-AltExp-ex5
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+# Calculate explained variance
+e <- attr(reducedDim(tse, "MDS_bray"), "eig")
+rel_eig <- e / sum(e[e > 0])
+
+# Add explained variance for each axis
+p <- p + labs(x = paste("PCoA 1 (", round(100 * rel_eig[[1]], 1), "%", ")", sep = ""),
+              y = paste("PCoA 2 (", round(100 * rel_eig[[2]], 1), "%", ")", sep = ""))
+
+p
+```
+
 
 ## Community (alpha) diversity
 
@@ -664,7 +869,7 @@ Useful functions: nrow, ncol, dim, summary, table, quantile, unique, transformAs
    contain the alpha diversity indices?
 4. **Extra**: Agglomerate the TreeSE by phylum and compare the mean Shannon
    diversity of the original experiment with its agglomerated version. You can
-   use `mergeFeaturesByRank` to perform agglomeration and `mean` to calculate the
+   use `agglomerateByRank` to perform agglomeration and `mean` to calculate the
    mean values of the respective columns in colData.
 
 ### Visualization 
@@ -848,17 +1053,90 @@ the contributions to beta diversity.
 
 ### Beta diversity analysis
 
-This exercise prompts you to implement a workflow with distance-based RDA
-(dbRDA). You can refer to chapter [@sec-dbrda-workflow] for a step-by-step
+In this exercise, we'll ask you to implement a distance-based Redundancy Analysis
+(dbRDA) to understand the relationships between microbial community composition and various environmental or experimental factors within your dataset. You can refer to chapter [@sec-dbrda-workflow] for a step-by-step
 walkthrough, which may be simplified in the future.
 
 1. Import the mia and vegan packages, load any of the example data sets mentioned in Chapter 3.3
    with `data` and store it into a variable named `tse`.
-4. Create dbRDA with Bray-Curtis dissimilarities on relative abundances. Use PERMANOVA. Can differences between samples be explained with variables of sample meta data? 
-5. Analyze diets' association on beta diversity. Calculate dbRDA and then PERMANOVA. Visualize coefficients. Which taxa's abundances differ the most between samples? 
-6. Interpret your results. Is there association between community composition and location? What are those taxa that differ the most; find information from literature.
+
+```{r}
+#| label: dbRDA-ex1
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+library(mia)
+library(vegan)
+
+data("GlobalPatterns", package = "mia") # Feel free to use any dataset
+tse <- GlobalPatterns
+```
+
+2. Create a new assay with the relative abundance for your dataset and merge the 
+features by a rank of your choice.
+
+```{r}
+#| label: dbRDA-ex2
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse <- transformAssay(tse, MARGIN = "samples", method="relabundance")
+
+tse <- agglomerateByRank(tse,
+                           rank = "Species")
+```
+
+3. Select a group of samples and perform a permutational analysis on the calculated relative abundance using Bray-Curtis dissimilarities 
+
+```{r}
+#| label: dbRDA-ex3
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+tse$Group <- tse$SampleType == "Feces"
+```
+
+4. Implement dbRDA on relative abundances and perform another permutational analysis on the redundancy analysis. 
+
+```{r}
+#| label: dbRDA-ex4
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+dbrda <- dbrda(t(assay(tse,"relabundance")) ~ Group, 
+               data = colData(tse))
+
+permanova2 <- anova.cca(dbrda,
+                        by = "margin",
+                        method = "bray",
+                        permutations = 99)
+permanova2
+```
+
+The output of the anova holds information on the variation differences between 
+the chosen groups, in our case w have a p-value of 0.01, indicating that 
+the groups are statistically different and are unlikely to be due to random 
+variation
+
+5. Extract the p-values. Can the differences between samples be explained with variables from their metadata? 
+```{r}
+#| label: dbRDA-ex5
+#| eval: FALSE
+#| code-fold: true
+#| code-summary: "Show solution"
+
+p_values <- c(permanova["Group", "Pr(>F)"], permanova2["Group", "Pr(>F)"])
+
+p_values <-as.data.frame(p_values)
+
+rownames(p_values) <- c("adonis2", "dbRDA+anova.cca")
+```
 
-Useful functions: scater::runMDS, runRDA, vegan::anova.cca, transformAssay, mergeFeaturesByRank, ggplot, scater::plotReducedDim, vegan::adonis2
+Useful functions: scater::runMDS, runRDA, vegan::anova.cca, transformAssay, agglomerateByRank, ggplot, scater::plotReducedDim, vegan::adonis2
 
 
 ## Differential abundance