Kullback's inequality

From HandWiki

In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then DKL(PQ)ΨQ*(μ'1(P)), where ΨQ* is the rate function, i.e. the convex conjugate of the cumulant-generating function, of Q, and μ'1(P) is the first moment of P.

The Cramér–Rao bound is a corollary of this result.

Proof

Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q. Consider the natural exponential family of Q given by Qθ(A)=AeθxQ(dx)eθxQ(dx)=1MQ(θ)AeθxQ(dx) for every measurable set A, where MQ is the moment-generating function of Q. (Note that Q0 = Q.) Then DKL(PQ)=DKL(PQθ)+suppP(logdQθdQ)dP. By Gibbs' inequality we have DKL(PQθ)0 so that DKL(PQ)suppP(logdQθdQ)dP=suppP(logeθxMQ(θ))P(dx) Simplifying the right side, we have, for every real θ where MQ(θ)<: DKL(PQ)μ'1(P)θΨQ(θ), where μ'1(P) is the first moment, or mean, of P, and ΨQ=logMQ is called the cumulant-generating function. Taking the supremum completes the process of convex conjugation and yields the rate function: DKL(PQ)supθ{μ'1(P)θΨQ(θ)}=ΨQ*(μ'1(P)).

Corollary: the Cramér–Rao bound

Main page: Cramér–Rao bound

Start with Kullback's inequality

Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then limh0DKL(Xθ+hXθ)h2limh0Ψθ*(μθ+h)h2,

where Ψθ* is the convex conjugate of the cumulant-generating function of Xθ and μθ+h is the first moment of Xθ+h.

Left side

The left side of this inequality can be simplified as follows: limh0DKL(Xθ+hXθ)h2=limh01h2log(dXθ+hdXθ)dXθ+h=limh01h2log(dXθdXθ+h)dXθ+h=limh01h2log(1(1dXθdXθ+h))dXθ+h=limh01h2[(1dXθdXθ+h)+12(1dXθdXθ+h)2+o((1dXθdXθ+h)2)]dXθ+hTaylor series for log(1t)=limh01h2[12(1dXθdXθ+h)2]dXθ+h=limh01h2[12(dXθ+hdXθdXθ+h)2]dXθ+h=12X(θ) which is half the Fisher information of the parameter θ.

Right side

The right side of the inequality can be developed as follows: limh0Ψθ*(μθ+h)h2=limh01h2supt{μθ+htΨθ(t)}. This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is Ψ'θ(τ)=μθ+h, but we have Ψ'θ(0)=μθ, so that Ψ'θ(0)=dμθdθlimh0hτ. Moreover, limh0Ψθ*(μθ+h)h2=12Ψ'θ(0)(dμθdθ)2=12Var(Xθ)(dμθdθ)2.

Putting both sides back together

We have: 12X(θ)12Var(Xθ)(dμθdθ)2, which can be rearranged as: Var(Xθ)(dμθ/dθ)2X(θ).

See also

Notes and references

  1. Fuchs, Aimé; Letta, Giorgio (1970). "L'inégalité de Kullback. Application à la théorie de l'estimation". Séminaire de Probabilités de Strasbourg. Séminaire de probabilités (Strasbourg) 4: 108–131. http://www.numdam.org/item?id=SPS_1970__4__108_0.