# Statistics: Ellipses of regression

Notation is used inconsistently and needs to be corrected

Consider a linear regression model:

$\mathbf{Y}=\mathbf{X\beta }+\mathbf{\varepsilon }$
$\mathbf{Y}=\mathbf{X}^{*}\mathbf{\beta }^{*}+\mathbf{\varepsilon }=\left[ \mathbf{1X} \right]\left[ \begin{matrix} \beta _{0} \\ \mathbf{\beta } \\ \end{matrix} \right]+\mathbf{\varepsilon }$

where $\mathbf{X}$ is a $n\times p$ matrix, $\operatorname{E}(\mathbf{\varepsilon })=\mathbf{0}$ and $\operatorname{Var}(\mathbf{\varepsilon })=\sigma ^{2}\mathbf{I}$.

Recall that the least-squares estimator $\mathbf{\hat{\beta }}=\left( \mathbf{X}'\mathbf{X} \right)^{-1}\mathbf{X}'\mathbf{Y}$.

Let $\mathbf{\Sigma }_{\mathbf{XX}}$ be the variance-covariance matrix for the $\mathbf{X}$ matrix.

$\mathbf{\Sigma }_{\mathbf{XX}}=\frac{1}{n}\mathbf{X'}\left( \mathbf{I-P} \right)\mathbf{X}=\frac{1}{n}\left[ \mathbf{X'X}-\mathbf{X'PX} \right]=\frac{1}{n}\left[ \mathbf{{X}'X}-n\mathbf{\bar{x}{\bar{x}}'} \right]$

where $\mathbf{P}=\mathbf{1(1'1)}^{\mathbf{-1}}\mathbf{1'}=\frac{1}{n}\mathbf{U}$ is the matrix of the orthogonal projection onto the subspace of $\mathbb{R}^{n}$ spanned by the $\mathbf{1}$ vector, $\operatorname{span}(\mathbf{1})$.

Recall that $\operatorname{E}(\mathbf{\hat{\beta }})=\mathbf{\beta }$ and $\operatorname{Var}(\mathbf{\hat{\beta }})=\sigma ^{2}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}$.

If we let s be the residual standard error with ν = np − 1 degrees of freedom then, under the assumption of normality,

$\left( \mathbf{\hat{\beta }}-\mathbf{\beta } \right)^{\prime }\left( s^{2}\mathbf{\Sigma }_{\mathbf{XX}}^{-1} \right)^{-1}\left( \mathbf{\hat{\beta }}-\mathbf{\beta } \right)\sim pF_{p,\nu }$

Using the notation described in Statistics: Ellipses, a 100(1 − α)% confidence ellipse for $\mathbf{\beta }$ in $\mathbb{R}^{p}$ can be expressed as:

$\mathbf{\hat{\beta }}\oplus \sqrt{pF_{p,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}$

The data ellipse for the predictors is:

$\mathbf{\bar{x}}\oplus \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{{}}}$.

The ellipse

$\mathbf{\hat{\beta }}\oplus \sqrt{1F_{1,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{\hat{\beta }}\oplus t_{\nu }^{1-\alpha/2 }\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}$

has the property that its projections onto 1-dimensional axes produce 100(1 − α)% confidence interval for the corresponding parameter.

In general, the ellipse

$\mathbf{\hat{\beta }}\oplus \sqrt{dF_{d,\nu }^{1-\alpha }}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}$

has projections that are Sheffe confidence regions that have a minimal coverage probability 100(1 − α)% when the space of parameters estimated has been selected from a space of dimension d.

The following figures show the projection of a 3-dimensional confidence ellipsoid onto 2-dimensional and 1-dimensional subspaces.

The 3-dimensional ellipsoid is (using the notation above):

$\mathbf{\hat{\beta }}^{*}\oplus \sqrt{3F_{3,\nu }^{0.95}}\ s\ \sqrt{\left( \mathbf{{X}'X} \right)^{-1}}$

The 1- and 2-dimensional ellipses have the form:

$\mathbf{L\hat{\beta }}^{*}\oplus \sqrt{3F_{3,\nu }^{0.95}}\ s\ \sqrt{\mathbf{L}\left( \mathbf{{X}'X} \right)^{-1}\mathbf{{L}'}}$

where $\mathbf{L}$ is the appropriate transformation. For example, to produce the ellipse for the last two coefficients

$\mathbf{L}=\left[ \begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{matrix} \right]$

The following shows in green the 2-dimensional joint ellipse for the two slope parameters, 'b.Weight' and 'b.Height'. Also shown in blue is are the 1-dimensional ordinary confidence intervals and the 2-dimensional ellipse with Scheffe radius 1 whose shadows produce 1-dimensional ordinary confidence intervals.

The green ellipse is:

$\mathbf{\hat{\beta }}\oplus \sqrt{2F_{2,\nu }^{0.95}}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{L\hat{\beta }}^{*}\oplus \sqrt{2F_{2,\nu }^{0.95}}\ s\ \sqrt{\mathbf{L}\left( \mathbf{{X}'X} \right)^{-1}\mathbf{{L}'}}$

The blue ellipse is:

$\mathbf{\hat{\beta }}\oplus \sqrt{F_{1,\nu }^{0.95}}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}=\mathbf{\hat{\beta }}\oplus t_{\nu }^{0.975}\ s\ \sqrt{\mathbf{\Sigma }_{\mathbf{XX}}^{-1}}$

and the blue intervals are ordinary 95% confidence intervals and shadows of the blue ellipse:

$\hat{\beta }_{i}\pm \sqrt{F_{1,\nu }^{0.95}}\ s\ \sqrt{\mathbf{{e}'}_{i}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}\mathbf{e}_{i}}=\hat{\beta }_{i}\pm t_{\nu }^{0.975}\ s\ \sqrt{\mathbf{{e}'}_{i}\mathbf{\Sigma }_{\mathbf{XX}}^{-1}\mathbf{e}_{i}}=\hat{\beta }_{i}\pm t_{\nu }^{0.975}\ \frac{s}{\sqrt{n}\ \sigma _{X_{i}|X_{{\hat{i}}}}}$

where

$\sigma _{X_{i}|X_{{\hat{i}}}}$

denotes the partial standard deviation of Xi adjusting for Xs other than Xi.

Any subset of these confidence regions have joint coverage probability at least 95%. Equivalently, any one of these confidence regions has coverage probability at least 95% even if the subspace for the parameters was selected as a subspace of this 3-dimensional space after seeing the data. Thus these regions provided protection against 'fishing' or 'data dredging' within a specified space of parameters.

For for a general non-technical treatment of multiple testing and adjusting for 'data dredging' for hypotheses, see a recent article: Bender, R., Lange, S. (2001) "Adjusting for multiple testing-when and how?", Journal of Clinical Epidemiology 54 343–349 (http://resolver.scholarsportal.info.ezproxy.library.yorku.ca/resolve/08954356/v54i0004/343_afmtah&form=pdf&file=file.pdf)

For pairwise comparisons see:

Jaccard J, Becker MA, Wood G. Pairwise multiple comparison procedures: a review. Psychol Bull 1984;96:589–96.

Seaman MA, Levin JR, Serlin RC. New developments in pairwise multiple comparisons: some powerful and practicable procedures. Psychol Bull 1991;110:577–86. (http://resolver.scholarsportal.info/resolve/00332909/v110i0003/577_ndipmcspapp&form=pdf&file=file.pdf)

The figures in this page were produced with a script written in R.