Ensemble sampler algorithm

The ensemble sampler algorithm uses a group of Markov chains to explore the parameter space. Each chain j starts with a point in the parameter space φ_j,t and is evolved using the complementary chains of the ensemble.

Christen (2006) found that it is possible to evolve the chain φ_j,t to the state t+1 via a walk move using a complementary chain of the ensemble. Goodman & Weare (2010) used the idea of the walk move to construct an affine invariant move called stretch move. The stretch move for the chain φ_j,t is defined as

$\LARGE \vec{\Phi}_j = \vec{\phi}_{k,t} + z \left( \vec{\phi}_{j,t} - \vec{\phi}_{k,t} \right),$

where φ_k,t is a complementary chain of the ensemble, such that j ≠ k and z is a scaling variable that regulates the step. This scaling variable has to come from a density distribution g with the symmetry condition (Christen 2006)

$\LARGE g \left( \frac{1}{z} \right) = z\,g(z).$

A distribution that follows this condition is

$\LARGE g(z) \propto \left\{ \begin{array}{lr} \frac{1}{\sqrt{z}} & : z \in \left[ \frac{1}{a}, a \right] \\ 0 & : {\rm otherwise}, \end{array} \right.$

where a > 1. There is no optimal value for a, but we set a = 2 to be consistent with ensemble sampler algorithms in the literature (e.g., Goodman & Weare 2010, Hou et al., 2012). To ensure the invariant distribution we have to compute the ratio

$\LARGE q = z^{N-1} \frac{P(M(\vec{\Phi})|D)} {P(M(\vec{\phi}_{j,t})|D)}.$

The term z^N-1 ensures detailed balance ( for more details see Goodman & Weare 2010). To decide whether we accept or not the proposed state we use

$\LARGE \begin{array}{lr} \vec{\phi}_{j,t + 1} = \vec{\Phi}_j & : q > U \\ \vec{\phi}_{j,t + 1} = \vec{\phi}_{j,t} & : q \leq U, \end{array}$

where U is a random number between [0,1]. After a number N of iterations and L chains, we will have N \times L samples for each parameter from which we can create posterior distributions. Below you can see a general overview of the evolution of the ensemble

The following Figure shows an example of the evolution of six chains of the ensemble sampler algorithm. The chains started at a different point in the parameter space. After a finite number of iterations (in this case a few hundreds), the chains converges to a stable region of the parameter space. Details on how we create marginal posterior distributions from chain's samples are provided below. Another advantage of this approach is that, since each Markov chain evolves independently, this algorithm can be parallelized (see Foreman-Mackey et al., 2013).

ensemble

Example of the evolution of an ensemble sampler algorithm using six chains. Upper panel: Parameter value for each chain from iteration 0 to 5000 in logarithmic scale. Lower left panel: Chains behavior for the last 4000 iterations. Lower right panel: Histogram created using the information contained in all the chains in the last 4000 iterations.

Convergence

In order to infer the parameter values based on a MCMC sampling we need to use chains that have converged. A widely used convergence test has been developed by Gelman & Rubin (1992). This test compares the between-chain B and within-chain W variance via the scaled potential factor

$\LARGE \hat{R} = \sqrt{[W(n-1)/n + B/n]/W}$

where n is the length of each chain. We define convergence as when chains have * R^ < 1.02* for all the parameters.

Marginal posterior distribution from parameter sampling

Chains that have converged to a static solution represent a sample of the marginal posterior distribution from which they were sampled. The frequencies of the chains can be used to create the posterior distribution of the sampled parameters. A common way to draw the sampling frequency is with a histogram, as shown in previous Figure. The final marginal posterior distribution for each parameter is also called credible interval.

The median and the 68% limits of the credible interval are commonly used to define the parameter's best estimate and its uncertainty (see, e.g., Hogg & Foreman-Mackey 2018). When the marginal posterior distribution follows a Gaussian distribution, the median and the 68% limits of the credible interval correspond to the mean and standard deviation of a normal distribution. When the posterior distribution is skewed, the 68% limits are not symmetric with respect to the median, and they give an "first-order" idea of the shape of the marginal posterior distribution that describes a given parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensemble sampler algorithm

Convergence

Marginal posterior distribution from parameter sampling

Clone this wiki locally