Metropolis-adjusted Langevin algorithm

From HandWiki
Short description: Markov Chain Monte Carlo algorithm

In computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:

Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,[1] (although the method Smart Monte Carlo was already introduced in 1978 [2]) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie[3] and Jeff Rosenthal.[4] Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).[5] The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.[5]

Further details

Let π denote a probability density function on d, one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion

X˙=logπ(X)+2W˙

driven by the time derivative of a standard Brownian motion W. (Note that another commonly-used normalization for this diffusion is

X˙=12logπ(X)+W˙,

which generates the same dynamics.) In the limit as t, this probability distribution ρ(t) of X(t) approaches a stationary distribution, which is also invariant under the diffusion, which we denote ρ. It turns out that, in fact, ρ=π.

Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step τ>0. We set X0:=x0 and then recursively define an approximation Xk to the true solution X(kτ) by

Xk+1:=Xk+τlogπ(Xk)+2τξk,

where each ξk is an independent draw from a multivariate normal distribution on d with mean 0 and covariance matrix equal to the d×d identity matrix. Note that Xk+1 is normally distributed with mean Xk+τlogπ(Xk) and covariance equal to 2τ times the d×d identity matrix.

In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates Xk according to the update rule

Xk+1:=Xk+τlogπ(Xk)+2τξk,

MALA incorporates an additional step. We consider the above update rule as defining a proposal X~k+1 for a new state,

X~k+1:=Xk+τlogπ(Xk)+2τξk.

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set

α:=min{1,π(X~k+1)q(XkX~k+1)π(Xk)q(X~k+1Xk)},

where

q(xx)exp(14τxxτlogπ(x)22)

is the transition probability density from x to x (note that, in general q(xx)q(xx)). Let u be drawn from the continuous uniform distribution on the interval [0,1]. If uα, then the proposal is accepted, and we set Xk+1:=X~k+1; otherwise, the proposal is rejected, and we set Xk+1:=Xk.

The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution ρ=π. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher π probability, which are then more likely to be accepted. On the other hand, when π is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take 0<τ1 in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix Ad×d can help to alleviate this problem, by generating proposals according to

X~k+1:=Xk+τAlogπ(Xk)+2τAξk,

so that X~k+1 has mean Xk+τAlogπ(Xk) and covariance 2τA.

For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be 0.574; if it is discovered to be substantially different in practice, τ should be modified accordingly.[4]

References

  1. J. Besag (1994). "Comments on "Representations of knowledge in complex systems" by U. Grenander and MI Miller". Journal of the Royal Statistical Society, Series B 56: 591–592. 
  2. Rossky, P.J.; Doll, J.D.; Friedman, H.L. (1978). "Brownian Dynamics as smart Monte Carlo simulation". J. Chem. Physics 69 (10): 4628. doi:10.1063/1.436415. 
  3. G. O. Roberts and R. L. Tweedie (1996). "Exponential convergence of Langevin distributions and their discrete approximations". Bernoulli 2 (4): 341–363. doi:10.2307/3318418. http://projecteuclid.org/euclid.bj/1178291835. 
  4. 4.0 4.1 G. O. Roberts and J. S. Rosenthal (1998). "Optimal scaling of discrete approximations to Langevin diffusions". Journal of the Royal Statistical Society, Series B 60 (1): 255–268. doi:10.1111/1467-9868.00123. 
  5. 5.0 5.1 M. Girolami and B. Calderhead (2011). "Riemann manifold Langevin and Hamiltonian Monte Carlo methods". Journal of the Royal Statistical Society, Series B 73 (2): 123–214. doi:10.1111/j.1467-9868.2010.00765.x.