Stein's unbiased risk estimate

From HandWiki

In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator."[1] In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly. The technique is named after its discoverer, Charles Stein.[2]

Formal statement

Let μd be an unknown parameter and let xd be a measurement vector whose components are independent and distributed normally with mean μi,i=1,...,d, and variance σ2. Suppose h(x) is an estimator of μ from x, and can be written h(x)=x+g(x), where g is weakly differentiable. Then, Stein's unbiased risk estimate is given by[3]

SURE(h)=dσ2+g(x)2+2σ2i=1dxigi(x)=dσ2+g(x)2+2σ2i=1dxihi(x),

where gi(x) is the ith component of the function g(x), and is the Euclidean norm.

The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of h(x), i.e.

Eμ{SURE(h)}=MSE(h),

with

MSE(h)=Eμh(x)μ2.

Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter μ in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of μ.

Proof

We wish to show that

Eμh(x)μ2=Eμ{SURE(h)}.

We start by expanding the MSE as

Eμh(x)μ2=Eμg(x)+xμ2=Eμg(x)2+Eμxμ2+2Eμg(x)T(xμ)=Eμg(x)2+dσ2+2Eμg(x)T(xμ).

Now we use integration by parts to rewrite the last term:

Eμg(x)T(xμ)=d12πσ2dexp(xμ22σ2)i=1dgi(x)(xiμi)ddx=σ2i=1dd12πσ2dexp(xμ22σ2)dgidxiddx=σ2i=1dEμdgidxi.

Substituting this into the expression for the MSE, we arrive at

Eμh(x)μ2=Eμ(dσ2+g(x)2+2σ2i=1ddgidxi).

Applications

A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator.[2] The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.[1]

References

  1. 1.0 1.1 Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association 90 (432): 1200–1244. doi:10.2307/2291512. 
  2. 2.0 2.1 Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics 9 (6): 1135–1151. doi:10.1214/aos/1176345632. 
  3. Wasserman, Larry (2005). All of Nonparametric Statistics.