Statistics: Ellipses of regression

From MathWiki

Notation is used inconsistently and needs to be corrected

Consider a linear regression model:

\mathbf{Y}=\mathbf{X\beta }+\mathbf{\varepsilon }
\mathbf{Y}=\mathbf{X}^{*}\mathbf{\beta }^{*}+\mathbf{\varepsilon }=\left[ \mathbf{1X} \right]\left[ \begin{matrix}    \beta _{0}  \\    \mathbf{\beta }  \\ \end{matrix} \right]+\mathbf{\varepsilon }

where \mathbf{X} is a n\times p matrix, \operatorname{E}(\mathbf{\varepsilon })=\mathbf{0} and \operatorname{Var}(\mathbf{\varepsilon })=\sigma ^{2}\mathbf{I}.

Recall that the least-squares estimator \mathbf{\hat{\beta }}=\left( \mathbf{X}'\mathbf{X} \right)^{-1}\mathbf{X}'\mathbf{Y}.

Let \mathbf{\Sigma }_{\mathbf{XX}} be the variance-covariance matrix for the \mathbf{X} matrix.

\mathbf{\Sigma }_{\mathbf{XX}}=\frac{1}{n}\mathbf{X'}\left( \mathbf{I-P} \right)\mathbf{X}=\frac{1}{n}\left[ \mathbf{X'X}-\mathbf{X'PX} \right]=\frac{1}{n}\left[ \mathbf{{X}'X}-n\mathbf{\bar{x}{\bar{x}}'} \right]

where \mathbf{P}=\mathbf{1(1'1)}^{\mathbf{-1}}\mathbf{1'}=\frac{1}{n}\mathbf{U} is the matrix of the orthogonal projection onto the subspace of \mathbb{R}^{n} spanned by the \mathbf{1} vector, \operatorname{span}(\mathbf{1}).

Recall that \operatorname{E}(\mathbf{\hat{\beta }})=\mathbf{\beta } and \operatorname{Var}(\mathbf{\hat{\beta }})=\sigma ^{2}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}.

If we let s be the residual standard error with ν = np − 1 degrees of freedom then, under the assumption of normality,

\left( \mathbf{\hat{\beta }}-\mathbf{\beta } \right)^{\prime }\left( s^{2}\mathbf{\Sigma }_{\mathbf{XX}}^{-1} \right)^{-1}\left( \mathbf{\hat{\beta }}-\mathbf{\beta } \right)\sim pF_{p,\nu }

Using the notation described in Statistics: Ellipses, a 100(1 − α)% confidence ellipse for \mathbf{\beta } in \mathbb{R}^{p} can be expressed as:

\mathbf{\hat{\beta }}\oplus \sqrt{pF_{p,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}

The data ellipse for the predictors is:

\mathbf{\bar{x}}\oplus \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{{}}}.

The ellipse

\mathbf{\hat{\beta }}\oplus \sqrt{1F_{1,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{\hat{\beta }}\oplus t_{\nu }^{1-\alpha/2 }\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}

has the property that its projections onto 1-dimensional axes produce 100(1 − α)% confidence interval for the corresponding parameter.

In general, the ellipse

\mathbf{\hat{\beta }}\oplus \sqrt{dF_{d,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}

has projections that are Sheffe confidence regions that have a minimal coverage probability 100(1 − α)% when the space of parameters estimated has been selected from a space of dimension d.

The following figures show the projection of a 3-dimensional confidence ellipsoid onto 2-dimensional and 1-dimensional subspaces.

Image:Statistics Ellipses of regression 3D shadows view1.png

The 3-dimensional ellipsoid is (using the notation above):

\mathbf{\hat{\beta }}^{*}\oplus \sqrt{3F_{3,\nu }^{0.95}}\ s\ \sqrt{\left( \mathbf{{X}'X} \right)^{-1}}

The 1- and 2-dimensional ellipses have the form:

\mathbf{L\hat{\beta }}^{*}\oplus \sqrt{3F_{3,\nu }^{0.95}}\ s\ \sqrt{\mathbf{L}\left( \mathbf{{X}'X} \right)^{-1}\mathbf{{L}'}}

where \mathbf{L} is the appropriate transformation. For example, to produce the ellipse for the last two coefficients

\mathbf{L}=\left[ \begin{matrix}    0 & 1 & 0  \\    0 & 0 & 1  \\ \end{matrix} \right]

The following shows in green the 2-dimensional joint ellipse for the two slope parameters, 'b.Weight' and 'b.Height'. Also shown in blue is are the 1-dimensional ordinary confidence intervals and the 2-dimensional ellipse with Scheffe radius 1 whose shadows produce 1-dimensional ordinary confidence intervals.

Image:Statistics Ellipses of regression 3D shadows view2.png

The green ellipse is:

\mathbf{\hat{\beta }}\oplus \sqrt{2F_{2,\nu }^{0.95}}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{L\hat{\beta }}^{*}\oplus \sqrt{2F_{2,\nu }^{0.95}}\ s\ \sqrt{\mathbf{L}\left( \mathbf{{X}'X} \right)^{-1}\mathbf{{L}'}}

The blue ellipse is:

\mathbf{\hat{\beta }}\oplus \sqrt{F_{1,\nu }^{0.95}}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{\hat{\beta }}\oplus t_{\nu }^{0.975}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}

and the blue intervals are ordinary 95% confidence intervals and shadows of the blue ellipse:

\hat{\beta }_{i}\pm \sqrt{F_{1,\nu }^{0.95}}\ s\ \sqrt{\mathbf{{e}'}_{i}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}\mathbf{e}_{i}}=\hat{\beta }_{i}\pm t_{\nu }^{0.975}\ s\ \sqrt{\mathbf{{e}'}_{i}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}\mathbf{e}_{i}}=\hat{\beta }_{i}\pm t_{\nu }^{0.975}\ \frac{s}{\sqrt{n}\ \sigma _{X_{i}|X_{{\hat{i}}}}}


\sigma _{X_{i}|X_{{\hat{i}}}}

denotes the partial standard deviation of Xi adjusting for Xs other than Xi.

Any subset of these confidence regions have joint coverage probability at least 95%. Equivalently, any one of these confidence regions has coverage probability at least 95% even if the subspace for the parameters was selected as a subspace of this 3-dimensional space after seeing the data. Thus these regions provided protection against 'fishing' or 'data dredging' within a specified space of parameters.

For for a general non-technical treatment of multiple testing and adjusting for 'data dredging' for hypotheses, see a recent article: Bender, R., Lange, S. (2001) "Adjusting for multiple testing-when and how?", Journal of Clinical Epidemiology 54 343–349 (

For pairwise comparisons see:

Jaccard J, Becker MA, Wood G. Pairwise multiple comparison procedures: a review. Psychol Bull 1984;96:589–96.

Seaman MA, Levin JR, Serlin RC. New developments in pairwise multiple comparisons: some powerful and practicable procedures. Psychol Bull 1991;110:577–86. (

The figures in this page were produced with a script written in R.