-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathClustering.Rmd
executable file
·86 lines (66 loc) · 2.89 KB
/
Clustering.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Cluster analysis"
author: "Leo Lahti, Sudarshan Shetty et al."
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - clustering}
%\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
-->
## Multivariate (infinite) Gaussian mixture model
Fit and visualize Variational Dirichlet process multivariate infinite Gaussian mixture. This variational version has been partially written in C and it is relatively fast. Kindly cite [this article](http://bioinformatics.oxfordjournals.org/content/26/21/2713.short). Note that the implementation uses diagonal covariances on the Gaussian modes. The C code was partially derived from [Honkela et al. 2008](http://www.sciencedirect.com/science/article/pii/S0925231208000659).
```{r LCA2, fig.width=6, fig.height=5, warning=FALSE, message=FALSE, eval=FALSE}
library(netresponse)
# Generate simulated data
res <- generate.toydata(Dim = 2)
D <- res$data
component.means <- res$means
component.sds <- res$sds
# Fit the mixture
m <- mixture.model(D, mixture.method = "vdp", pca.basis = FALSE)
# Plot the data, and indicate estimated modes with colors.
# If data dimensionality exceeds 2,
# the results are visualized on PCA projection
# (with pca.basis = TRUE the data is projected on PCA coordinates;
# without loss of information. This trick can help to avoid overlearning
# as the variational mixture relies
# on diagonal covariance matrices, so the ellipsoidal axes of the
# Gaussian modes are parallel to the coordinate axes.)
p <- PlotMixtureMultivariate(D, means = m$mu, sds = m$sd, ws = m$w, modes = apply(m$qofz,1,which.max))
plot(p)
```
## Univariate (infinite) Gaussian mixture model
Fit and visualize Variational Dirichlet process univariate infinite Gaussian mixture. Kindly cite [this article](http://bioinformatics.oxfordjournals.org/content/26/21/2713.short) for the code.
```{r LCA1, fig.width=7, fig.height=5, warning=FALSE, message=FALSE, eval=FALSE}
# Generate simulated bimodal univariate data
x <- c(rnorm(200), rnorm(200, mean = 5))
# Variational Dirichlet process univariate Gaussian mixture
m <- mixture.model(x, mixture.method = "vdp", max.responses = 10)
# Plot the data and estimated modes
p <- PlotMixtureUnivariate(x, means = m$mu, sds = m$sd, ws = m$w, binwidth = 0.1, qofz = m$qofz)
plot(p)
```
## Clustering samples with mixed variables
Gower distance is useful for samples with mixed-type variables (binary, factor, numeric)):
```{r clustering-gower, fig.width=10, fig.height=4, warning=FALSE, message=FALSE, eval=FALSE}
# Example data
data("dietswap")
library(FD)
d <- gowdis(as(sample_data(dietswap), "data.frame"))
plot(hclust(d))
```