Normal-inverse-Wishart distribution

From HandWiki
Short description: Multivariate parameter family of continuous probability distributions
normal-inverse-Wishart
Notation (μ,Σ)NIW(μ0,λ,Ψ,ν)
Parameters μ0D location (vector of real)
λ>0 (real)
ΨD×D inverse scale matrix (pos. def.)
ν>D1 (real)
Support μD;ΣD×D covariance matrix (pos. def.)
PDF f(μ,Σ|μ0,λ,Ψ,ν)=𝒩(μ|μ0,1λΣ) 𝒲1(Σ|Ψ,ν)

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

Definition

Suppose

μ|μ0,λ,Σ𝒩(μ|μ0,1λΣ)

has a multivariate normal distribution with mean μ0 and covariance matrix 1λΣ, where

Σ|Ψ,ν𝒲1(Σ|Ψ,ν)

has an inverse Wishart distribution. Then (μ,Σ) has a normal-inverse-Wishart distribution, denoted as

(μ,Σ)NIW(μ0,λ,Ψ,ν).

Characterization

Probability density function

f(μ,Σ|μ0,λ,Ψ,ν)=𝒩(μ|μ0,1λΣ)𝒲1(Σ|Ψ,ν)

The full version of the PDF is as follows:[2]

f(μ,Σ|μ0,λ,Ψ,ν)=λD/2|Ψ|ν/2|Σ|ν+D+22(2π)D/22νD2ΓD(ν2)exp{12Tr(ΨΣ1)λ2(μμ0)TΣ1(μμ0)}

Here ΓD[] is the multivariate gamma function and Tr(Ψ) is the Trace of the given matrix.

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over Σ is an inverse Wishart distribution, and the conditional distribution over μ given Σ is a multivariate normal distribution. The marginal distribution over μ is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

yi|μ,Σ𝒩p(μ,Σ)

where y is an n×p matrix and yi (of length p) is row i of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

(μ,Σ)NIW(μ0,λ,Ψ,ν).

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

(μ,Σ|y)NIW(μn,λn,Ψn,νn),

where

μn=λμ0+ny¯λ+n
λn=λ+n
νn=ν+n
Ψn=Ψ+S+λnλ+n(y¯μ0)(y¯μ0)TwithS=i=1n(yiy¯)(yiy¯)T.


To sample from the joint posterior of (μ,Σ), one simply draws samples from Σ|y𝒲1(Ψn,νn), then draw μ|Σ,y𝒩p(μn,Σ/λn). To draw from the posterior predictive of a new observation, draw y~|μ,Σ,y𝒩p(μ,Σ) , given the already drawn values of μ and Σ.[3]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

  1. Sample Σ from an inverse Wishart distribution with parameters Ψ and ν
  2. Sample μ from a multivariate normal distribution with mean μ0 and variance 1λΣ

Notes

  1. Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
  2. Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
  3. Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

  • Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
  • Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]