Kullback's inequality

In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.^[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then $D_{K L} (P ∥ Q) \geq Ψ_{Q}^{*} (μ'_{1} (P)),$ where $Ψ_{Q}^{*}$ is the rate function, i.e. the convex conjugate of the cumulant-generating function, of $Q$ , and $μ'_{1} (P)$ is the first moment of $P .$

The Cramér–Rao bound is a corollary of this result.

Proof

Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q. Consider the natural exponential family of Q given by $Q_{θ} (A) = \frac{\int_{A} e^{θ x} Q (d x)}{\int_{- \infty}^{\infty} e^{θ x} Q (d x)} = \frac{1}{M_{Q} (θ)} \int_{A} e^{θ x} Q (d x)$ for every measurable set A, where $M_{Q}$ is the moment-generating function of Q. (Note that Q₀ = Q.) Then $D_{K L} (P ∥ Q) = D_{K L} (P ∥ Q_{θ}) + \int_{supp P} (\log \frac{d Q_{θ}}{d Q}) d P .$ By Gibbs' inequality we have $D_{K L} (P ∥ Q_{θ}) \geq 0$ so that $D_{K L} (P ∥ Q) \geq \int_{supp P} (\log \frac{d Q_{θ}}{d Q}) d P = \int_{supp P} (\log \frac{e^{θ x}}{M_{Q} (θ)}) P (d x)$ Simplifying the right side, we have, for every real θ where $M_{Q} (θ) < \infty :$ $D_{K L} (P ∥ Q) \geq μ'_{1} (P) θ - Ψ_{Q} (θ),$ where $μ'_{1} (P)$ is the first moment, or mean, of P, and $Ψ_{Q} = \log M_{Q}$ is called the cumulant-generating function. Taking the supremum completes the process of convex conjugation and yields the rate function: $D_{K L} (P ∥ Q) \geq \sup_{θ} {μ'_{1} (P) θ - Ψ_{Q} (θ)} = Ψ_{Q}^{*} (μ'_{1} (P)) .$

Corollary: the Cramér–Rao bound

Main page: Cramér–Rao bound

Start with Kullback's inequality

Let X_θ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then $\lim_{h \to 0} \frac{D_{K L} (X_{θ + h} ∥ X_{θ})}{h^{2}} \geq \lim_{h \to 0} \frac{Ψ_{θ}^{*} (μ_{θ + h})}{h^{2}},$

where $Ψ_{θ}^{*}$ is the convex conjugate of the cumulant-generating function of $X_{θ}$ and $μ_{θ + h}$ is the first moment of $X_{θ + h} .$

Left side

The left side of this inequality can be simplified as follows: $\begin{aligned} \lim_{h \to 0} \frac{D_{K L} (X_{θ + h} ∥ X_{θ})}{h^{2}} & = \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} \log (\frac{d X_{θ + h}}{d X_{θ}}) d X_{θ + h} \\ = - \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} \log (\frac{d X_{θ}}{d X_{θ + h}}) d X_{θ + h} \\ = - \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} \log (1 - (1 - \frac{d X_{θ}}{d X_{θ + h}})) d X_{θ + h} \\ = \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} [(1 - \frac{d X_{θ}}{d X_{θ + h}}) + \frac{1}{2} {(1 - \frac{d X_{θ}}{d X_{θ + h}})}^{2} + o ({(1 - \frac{d X_{θ}}{d X_{θ + h}})}^{2})] d X_{θ + h} & Taylor series for \log (1 - t) \\ = \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} [\frac{1}{2} {(1 - \frac{d X_{θ}}{d X_{θ + h}})}^{2}] d X_{θ + h} \\ = \lim_{h \to 0} \frac{1}{h^{2}} \int_{- \infty}^{\infty} [\frac{1}{2} {(\frac{d X_{θ + h} - d X_{θ}}{d X_{θ + h}})}^{2}] d X_{θ + h} \\ = \frac{1}{2} ℐ_{X} (θ) \end{aligned}$ which is half the Fisher information of the parameter θ.

Right side

The right side of the inequality can be developed as follows: $\lim_{h \to 0} \frac{Ψ_{θ}^{*} (μ_{θ + h})}{h^{2}} = \lim_{h \to 0} \frac{1}{h^{2}} \sup_{t} {μ_{θ + h} t - Ψ_{θ} (t)} .$ This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is $Ψ'_{θ} (τ) = μ_{θ + h},$ but we have $Ψ'_{θ} (0) = μ_{θ},$ so that $Ψ^{'}'_{θ} (0) = \frac{d μ_{θ}}{d θ} \lim_{h \to 0} \frac{h}{τ} .$ Moreover, $\lim_{h \to 0} \frac{Ψ_{θ}^{*} (μ_{θ + h})}{h^{2}} = \frac{1}{2 Ψ^{'}'_{θ} (0)} {(\frac{d μ_{θ}}{d θ})}^{2} = \frac{1}{2 Var (X_{θ})} {(\frac{d μ_{θ}}{d θ})}^{2} .$

Putting both sides back together

We have: $\frac{1}{2} ℐ_{X} (θ) \geq \frac{1}{2 Var (X_{θ})} {(\frac{d μ_{θ}}{d θ})}^{2},$ which can be rearranged as: $Var (X_{θ}) \geq \frac{(d μ_{θ} / d θ)^{2}}{ℐ_{X} (θ)} .$

Notes and references

↑ Fuchs, Aimé; Letta, Giorgio (1970). "L'inégalité de Kullback. Application à la théorie de l'estimation". Séminaire de Probabilités de Strasbourg. Séminaire de probabilités (Strasbourg) 4: 108–131. http://www.numdam.org/item?id=SPS_1970__4__108_0.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Kullback's inequality. Read more

[1] Fuchs, Aimé; Letta, Giorgio (1970). "L'inégalité de Kullback. Application à la théorie de l'estimation". Séminaire de Probabilités de Strasbourg. Séminaire de probabilités (Strasbourg) 4: 108–131. http://www.numdam.org/item?id=SPS_1970__4__108_0.

[1]

Anonymous

Search

Kullback's inequality

Namespaces

More

Page actions

Contents

Proof

Corollary: the Cramér–Rao bound

Start with Kullback's inequality

Left side

Right side

Putting both sides back together

See also

Notes and references

Navigation

Navigation

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Kullback's inequality

Proof

Corollary: the Cramér–Rao bound

Start with Kullback's inequality

Left side

Right side

Putting both sides back together

See also

Notes and references

Navigation

Wiki tools

Page tools

Other projects

Categories