Mean and predicted response

From HandWiki

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Background

In straight line fitting, the model is

yi=α+βxi+εi

where yi is the response variable, xi is the explanatory variable, εi is the random error, and α and β are parameters. The mean, and predicted, response value for a given explanatory value, xd, is given by

y^d=α^+β^xd,

while the actual response would be

yd=α+βxd+εd

Expressions for the values and variances of α^ and β^ are given in linear regression.

Mean response

Since the data in this context is defined to be (x, y) pairs for every observation, the mean response at a given value of x, say xd, is an estimate of the mean of the y values in the population at the x value of xd, that is E^(yxd)y^d. The variance of the mean response is given by

Var(α^+β^xd)=Var(α^)+(Varβ^)xd2+2xdCov(α^,β^).

This expression can be simplified to

Var(α^+β^xd)=σ2(1m+(xdx¯)2(xix¯)2),

where m is the number of data points.

To demonstrate this simplification, one can make use of the identity

(xix¯)2=xi21m(xi)2.

Predicted response

The predicted response distribution is the predicted distribution of the residuals at the given point xd. So the variance is given by

Var(yd[α^+β^xd])=Var(yd)+Var(α^+β^xd)2Cov(yd,[α^+β^xd])=Var(yd)+Var(α^+β^xd).

The second line follows from the fact that Cov(yd,[α^+β^xd]) is zero because the new prediction point is independent of the data used to fit the model. Additionally, the term Var(α^+β^xd) was calculated earlier for the mean response.

Since Var(yd)=σ2 (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by

Var(yd[α^+β^xd])=σ2+σ2(1m+(xdx¯)2(xix¯)2)=σ2(1+1m+(xdx¯)2(xix¯)2).

Confidence intervals

The 100(1α)% confidence intervals are computed as yd±tα2,mn1Var. Thus, the confidence interval for predicted response is wider than the interval for mean response. This is expected intuitively – the variance of the population of y values does not shrink when one samples from it, because the random variable εi does not decrease, but the variance of the mean of the y does shrink with increased sampling, because the variance in α^ and β^ decrease, so the mean response (predicted response value) becomes closer to α+βxd.

This is analogous to the difference between the variance of a population and the variance of the sample mean of a population: the variance of a population is a parameter and does not change, but the variance of the sample mean decreases with increased samples.

General linear regression

The general linear model can be written as

yi=j=1nXijβj+εi

Therefore, since yd=j=1nXdjβ^j the general expression for the variance of the mean response is

Var(j=1nXdjβ^j)=i=1nj=1nXdiSijXdj,

where S is the covariance matrix of the parameters, given by

𝐒=σ2(𝐗T𝐗)1.

References

  • Draper, N. R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.