Statistics: Ellipses

From MathWiki

Table of contents

Ellipses

For a positive definite matrix \Sigma \! we use E(\mu,\Sigma) \! to denote the ellipse \mathcal{E} = \{ x : (x-\mu)' \Sigma^{-1} (x-\mu) = 1 \}.

Note that every non-negative definite matrix W\! can be factored as W=AA'\!. The matrix A\! can be selected so that it is square. It will be non-singular if and only if W\! is non-singular. A definition of an ellipse that can be used for all non-negative definite matrices and that corresponds to the previous definition in the case of positive-definite matrices is

E(\mu,W) = \mu + A \mathcal{S}

where \mathcal{S} is a unit sphere of conformable dimension.

A convenient notation in terms of W is

E(\mu, W) = \mu \oplus \sqrt{W}

Note that under a linear transformation given by a matrix L the image of the ellipse is:

L(E(\mu ,W))=E(L\mu ,LW{L}')=L\mu \oplus \sqrt{LW{L}'}=L\mu \oplus L\sqrt{W}


Conjugate axes

The columns of the matrix \mathbf{A}=\left[ \begin{matrix}    \mathbf{a}_{1} & \mathbf{a}_{2} & \cdots  & \mathbf{a}_{p}  \\ \end{matrix} \right] form a set of conjugate axes of the ellipse. Each vector \mathbf{a}_{i} lies on the ellipse and the tangent space at that point is parallel to the span of all the other column vectors of \mathbf{A}

For p=2 \! this is illustrated in the following figure

Image:Statistics-Ellipses-Fig1.png

in which

\mathbf{A}=\left[ \begin{matrix}    \mathbf{a}_{1} & \mathbf{a}_{2}  \\ \end{matrix} \right]=\left[ \begin{matrix}    1 & 1.5  \\    2 & 1  \\ \end{matrix} \right]

and

\mathbf{W}=\mathbf{A{A}'}=\left[ \begin{matrix}    3.25 & 3.5  \\    3.5 & 5  \\ \end{matrix} \right]


If \mathbf{W}=\mathbf{B{B}'} is any other factorization of \mathbf{W} then the columns of \mathbf{B} have the same properties as the columns of \mathbf{A}.

Consider the inner product space with inner product matrix \mathbf{W}^{-1}=\left[ \begin{matrix}	  1.25 & -0.875 \\	  -0.875 & 0.8125 \\	  \end{matrix} \right] and inner product

\left\langle \mathbf{x},\mathbf{y} \right\rangle =\mathbf{{x}'W}^{-1}\mathbf{y}.

Since \mathbf{{A}'W}^{-1}\mathbf{A}=\mathbf{{A}'}\left( \mathbf{A{A}'} \right)^{-1}\mathbf{A}=\mathbf{{A}'{A}'}^{-1}\mathbf{A}^{-1}\mathbf{A}=\mathbf{I} we see that \mathbf{a}_{1} and \mathbf{a}_{2} are orthogonal unit vectors (in fact, an orthonormal basis) in this inner product:

\left\langle \mathbf{a}_{i},\mathbf{a}_{i} \right\rangle =\mathbf{{a}'}_{i}\mathbf{W}^{-1}\mathbf{a}_{i}=1
\left\langle \mathbf{a}_{1},\mathbf{a}_{2} \right\rangle =\mathbf{{a}'}_{1}\mathbf{W}^{-1}\mathbf{a}_{2}=0

Particular factorizations yield interesting sets of conjugate axes. In the figure on the right

Two conjugate vectors lie on an ellipse. Their tangents can be extended to form a parallelogram framing the ellipse.
Two conjugate vectors lie on an ellipse. Their tangents can be extended to form a parallelogram framing the ellipse.

the axes for the Choleski factorization where \mathbf{B} is lower triangular are shown in green and the axes for the principal component factorization (where \mathbf{B}=\mathbf{\Gamma \Lambda }^{1/2} and \mathbf{W}=\mathbf{\Gamma \Lambda {\Gamma }'} is the spectral decomposition of \mathbf{W}), are shown in magenta. The figures were produced with a script in R.

Hypothesis and error ellipses

Let E\! be a positive definite matrix and H\! be a non-negative definite matrix.

There exists a matrix A\! and a non-negative diagonal matrix such that:

E = AA' \!

and

H = A \Lambda A'\!.

The diagonal elements of \Lambda\!, \lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \!, are the eigenvalues of H\! relative to E\!. An R function to compute A'\! is given below.

In this discussion E\! and H\! can represent the SSE\! and an SSH\! matrices in multivariate linear regression both divided by a common factor, perhaps the degrees of freedom for SSE\! so that E\! is an unbiased estimate of the residual variance matrix \Sigma\!.

Expected value of SSH

We can find expressions for SSH\! in terms of the data and find expected values for SSH\! under fixed effects or under random effects models.

Univariate response

Consider first the expected value of SSH\!with a univariate response and a one-way Anova model with random group means.

The following formula is used repeatedly to find the expected value of a quadratic form. If Y\! is a random vector with \operatorname{E}(Y)=\mu and \operatorname{Var}(Y) = \Psi, and Q\! is a conformable matrix then

\operatorname{E}( Y'QY) = \mu'Q\mu + \operatorname{tr}(Q \Psi) \!

With a univariate response and G\! groups we can use an indicator design matrix D\! and write:

\mathbf{Y} = D \bold{\mu} + \epsilon\!

where

\mu \sim  N_G( 1\psi, \phi^2 I) \!

and

\epsilon \sim N(0, \sigma^2 I )\!

and \mu \! is independent of \epsilon \!.

Thus

\operatorname{E}(Y) = 1 \psi and \operatorname{Var}(Y) = \phi^2 DD' + \sigma^2 I\!

Consequently

\operatorname{E}(SSTO)\! = \operatorname{E}( Y'QY ) = \psi 1' Q 1 \psi + \operatorname{tr}\left[          (\phi^2 DD' + \sigma^2 I )(I - \frac{1}{N}U) \right]
= 0 + \operatorname{tr}\left[          \phi^2 DD' - \frac{\phi^2}{N}DD'U + \sigma^2 Q \right] \!
= \phi^2 N  - \frac{\phi^2}{N}\operatorname{tr}(DD'U) + \sigma^2 \operatorname{tr}( Q )
= \phi^2 N  - \frac{\phi^2}{N}\operatorname{tr}(1'DD'1) + \sigma^2 (N-1)
= \phi^2 N  - \frac{\phi^2}{N}\sum_{g=1}^G n_g^2 + \sigma^2 (N-1)
= \phi^2 (N  - \tilde{n}) + \sigma^2 (N-1)

where \tilde{n}= \sum_{g=1}^G \frac{n_g}{N} n_g is the group-size weighted mean of group sizes. With equal groups, \tilde{n} = N / G and

E(SSTO) = \phi^2 N \frac{G-1}{G} + \sigma^2 (N-1) =\phi^2 (N-\tilde{n}) + \sigma^2 (N-1)

Thus

E(SSH)\! = E (SSTO) - E(SSE)\!
= \phi^2 (N  - \tilde{n}) + \sigma^2 (N-1) - \sigma^2 (N-G)
= \phi^2 (N  - \tilde{n}) + \sigma^2 (G-1)

Multivariate response

Consider one-way MANOVA with G\! random group means:

\mu_g \sim N(\mu, \Phi), g=1,...,G

and

Y_{ig}|\mu_g \sim N( \mu_g, \Sigma), i = 1,...,n_g\!.

With N = \sum_g n_g\! we have

SSE \sim W_{N-G}(\Sigma)\!

The analogous results are:

E(SSE) = (N-G) \Sigma \!

and

E(SSH) = (N - \tilde{n}) \Phi + (G-1) \Sigma \!

Note that

Var( \bar{\mathbf{Y}}_{\cdot g} ) = \Phi + \frac{1}{n_g} \Sigma \!

and that the group-size weighted average of these variances is:

\sum_{g=1}^G \frac{n_g}{N} Var( \bar{\mathbf{Y}}_{\cdot g} )\! = \sum_{g=1}^G \frac{n_g}{N} \left[ \Phi + \frac{1}{n_g} \Sigma \right]\!
= \Phi + \frac{G}{N} \Sigma \!
= \Phi + \frac{1}{n} \Sigma with equal groups

The following table shows expectations of combinations of SSH\! and SSE\! of the form k_H SSH + k_E SSE\!:

k_H \! k_E \! E(k_H SSH + k_E SSE)\!
1 0 (N - \tilde{n}) \Phi + (G-1) \Sigma \!
0 1 (N-G) \Sigma \!
\frac{1}{N- \tilde{n}}\! 0 \Phi + \frac{G-1}{N - \tilde{n}} \Sigma
\frac{1}{N- \tilde{n}}=\frac{1}{N- n}\! 0 \Phi + \frac{G-1}{N - \tilde{n}} \Sigma = \Phi + \frac{1}{n} \Sigma =\operatorname{Var}(\bar{Y}_{\cdot g} )\! with equal groups
\frac{1}{N- n} \! \frac{1}{n} \! \Phi \! with equal groups
\frac{1}{\lambda_{crit}}\! 0 \frac{1}{\lambda_{crit}} \left[ (N - \tilde{n}) \Phi + (G-1) \Sigma \right] \!

Definition of SSH with one-way MANOVA

Consider one-way MANOVA with G\! groups, each with n_g\! observations. Let N = \sum_{g=1}^G n_g\! and let

D = \begin{bmatrix} 1_{n_1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1_{n_g}    \end{bmatrix}

be the design matrix.

Let Q\! be the N \times N\! residual projection matrix defined by

Q = I - 1_N(1_N'1_N)^{-1}1_N'=I-\frac{1}{N}U


In the case of one-way MANOVA there is an easy way to derive an expression for SSH:

SSTO\! =\! \sum_{g=1}^G \sum_{i=1}^{n_g} (Y_{ig} - \bar{Y}_{\cdot \cdot}) (Y_{ig}-\bar{Y}_{\cdot \cdot})' \!
=\! \sum_{g=1}^G \sum_{i=1}^{n_g} (Y_{ig}- \bar{Y}_{\cdot g}+\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot}) (Y_{ig}- \bar{Y}_{\cdot g}+\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})'
=\! \sum_{g=1}^G \sum_{i=1}^{n_g}\left\{ (Y_{ig}- \bar{Y}_{\cdot g})(Y_{ig}- \bar{Y}_{\cdot g})' +(\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})(\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})' -  (Y_{ig}- \bar{Y}_{\cdot g})(\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})' - (\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})(Y_{ig}- \bar{Y}_{\cdot g})' \right\}
=\! \sum_{g=1}^G \sum_{i=1}^{n_g} (Y_{ig}- \bar{Y}_{\cdot g})(Y_{ig}- \bar{Y}_{\cdot g})' +\sum_{g=1}^G n_g(\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})(\bar{Y}_{\cdot g}- \bar{Y}_{\cdot \cdot})'

Definition of SSH with a general linear hypothesis

Given a univariate linear model

Y = X_1 \beta_1 + X_2 \beta_2 + \epsilon \!

with Xi of dimension n \times k_i\!, the test for \beta_2 = 0\! is based on

SSH = (L \hat{\beta})'(L (X'X)^{-1}L')^{-1} L \hat{\beta} \!

where

L = [0 \ \   I] \!

and

X = [ X_1\ \   X_2]\!

We can show that

SSH\! = \! \hat{\beta}_2 ' W_{22 \cdot 1} \hat{\beta}_2\!
= \! \hat{\beta}_2 ' X'_{2\cdot 1} X_{2\cdot 1} \hat{\beta}_2\!

where W_{22 \cdot 1}\! is the partial SSCP matrix for X_2 \! regressing on X_1\! and X_{2\cdot 1}\! is the matrix of residuals of X_2 \! regressing on X_1\!.

Thus SSH\! is the sum of squared deviations (assuming that X_1\! spans the intercept) of 'y hat' in the added-variable plot for X_2\!. It is the variability in Y\! 'explained' by X_2\! beyond that 'explained' by X_1\!. SSE\! is the sum of squared deviations of residuals in the same plot.

Note that

E(SSH) \! = \! \beta'_2 X'_{2 \cdot 1}X_{2 \cdot 1}\beta_2  +         \operatorname{tr}(\sigma^2 W_{22 \cdot 1}^{-1}   W_{22 \cdot 1}) \!
=\! \beta'_2 X'_{2 \cdot 1}X_{2 \cdot 1}\beta_2    + h \sigma^2\!

The matrix of residuals, X_{2 \cdot 1} \!, is obtained by regression on X_1\! and, thus, has n-k_1\! degrees of freedom. We could therefore think of

SSH / (n- k_1)\!

with

E(SSH/(n- k_1)) = \frac{1}{n-k_1}\beta'_2 X'_{2 \cdot 1}X_{2 \cdot 1}\beta_2    + \frac{h}{n-k_1} \sigma^2\!

as an estimate of the variance of contribution of X_2\! to the variability of of Y\!.

Computing eigenvalues

To understand how eigenvalues are computed in car:::print.Anova.mlm, let

G = (E'E)^{-1}E'H = (AA'AA')^{-1}(AA')(A \Lambda A') = (AA')^{-1}(A \Lambda A')      = A'^{-1} \Lambda A' \!

Since

G A'^{-1} =(A'^{-1} \Lambda A' ) A'^{-1} = A'^{-1} \Lambda \!

\Lambda\! contains the eigenvalues of G\!.

Thus the eigenvalues and eigenvectors of H\! relative E\! can be obtained with

 > rel.eval ( H, E )
 > rel.evac ( H, E )

where

  > rel.eval <- function ( H, E ) {
          # This code is adapted from car:::print.Anova.mlm
          Re( eigen( qr.coef( qr(E), H ), only.values = TRUE )$values )
    }
  > rel.evec <- function ( H, E ) {
          Re(eigen( t(qr.coef( qr(E), H )))$vectors) 
    }

The eigenvectors are not, in general, a factor of E\!. To obtain a factor of E \! one can use:

  > rel.fac <- function( H, E ) {
         Bp <- fac(E)
         B.qr <- qr(t(Bp))
         H.star <-  qr.coef( B.qr, t( qr.coef( B.qr, H ) ))   # B^{-1} H B'^{-1}
         ev <- eigen( H.star , symmetric = T)
         ret <- t(ev$vectors) %*% Bp
         attr(ret,"lambda") <- ev$values
         ret 
    }

Note that the numeric properties of this function could be improved.

Visualizing significance

If we plot the ellipses \mathcal{H} = E(0,H)\! and \mathcal{E} = E(0,E)\!, the ellipse \mathcal{H}\! will protrude beyond \mathcal{E}\! along the direction a_i\!, where A = [a_1, ..., a_p]\! iff \lambda_i > 1\!.

Roy's largest root test

The F-test based on Roy's largest root uses the approximation ( e.g., see SAS documentation for multivariate tests (http://post.queensu.ca:8080/SASDoc/getDoc/en/statug.hlp/introreg_sect21.htm) )

F = \nu \lambda_1 / d \!

with degrees of freedom d\! and \nu\! where

d = \max(p,q)\!

where q\! is the number of degrees of freedom for the hypothesis and

\nu = \nu_E - d + q\!

where \nu_E\!is the number of degrees of freedom for the error matrix E\!.

The critical value for \lambda_1\! for an \alpha\! level test, can be computed as

  > lambda.crit <- function( alpha, p , q, nu.E) { 
       d <- max( p, q)
       nu <- nu.E - d + q
       (d/nu) * qf( alpha, d , nu, lower.tail = F)
  }

Thus, the ellipse E(0, \frac{1}{\lambda_{crit}}H)\! will have the property that it protrudes beyond E(0,E)\! if and only if the Roy test for the hypothesis whose matrix is H\! is significant at level \alpha\!.

Hotelling-Lawley trace

The Hotelling-Lawley trace statistic HLT = tr (H E^{-1})\! is based on the sum of the s = min (p, df_h)\! eigenvalues of (H E^{-1})\!. This suggests another criterion for judging visual significance based on the average eigenvalue, \bar{\lambda_i}\!.

One approximate F-test for the HLT is

F =  \frac{df2}{df1} \frac{ \operatorname{tr} (H E ^{-1}) }{s^2} = \frac{df2}{s * df1} \bar{\lambda_i} \!

where df1 = 2m + s + 1\!, df2 = 2(sn + 1)\! are the degrees of freedom for the numerator and denominator, and m = ( |p-q|-1 )/2\!, n = ( dfe -p -1 )/2\!.

Thus, the critical value for \bar{\lambda_i} \! can be computed as

 > HLT.crit <- function ( alpha, p, dfh, dfe) {
       s <- min(p, dfh)
       m <- (abs(p-dfh)-1)/2
       n <- (dfe-p-1)/2
       df1 <- 2*m + s + 1
       df2 <- 2*(s*n +1)
       s * (df1/df2) * qf(alpha, df1, df2, lower.tail=FALSE)	
       }

When p > 3

When the number of response variables involved in a hypothesis is greater than the number of dimensions plotted, one can use the vectors of the A\! matrix to construct linear combinations of the response variables that will show maximal significance of the hypothesis. These linear combinations depend on both E\! and H\! and thus could not be used to display results for many hypotheses simultaneously.

By projecting into the space corresponding to the three largest relative eigenvalues, we scale the projected H\! matrix, THT'\!, so its ellipse will protruded outside the ellipse of the projected E\! matrix, TET'\!, if and only if the overall Roy test is significant.

A convenient choice of T\! is

T = A^*_1 \!

where

A^{-1} =  \begin{bmatrix} A^*_1 \\ A^*_2 \end{bmatrix}\!

Then

E^* = TET' = I_{3\times 3} \! and

UNFINISHED


Note on computing matrix factors

A Choleski root, A'\! above, can be obtained in R with

> chol(W)

when W\! is positive-definite.

The singular value decomposition also works for singular and ill-conditioned matrices and can be obtained with

> fac(W)

where

  > fac <- function(W) {  
       # assumes W is non-negative definite
       # returns the 'right' of W, D^(1/2) Gamma', where W = Gamma D Gamma' 
       xx <- svd(W, nu=0)
       t(xx$v) * sqrt( pmax( xx$d,0) )
  }


Thus, an ellipse can be generated in R with:

  > ell( mu, W )

where

  > ell <- function( mu, W ) {
     p <- nrow(W)
     t( mu + t( sphere(p) %*% fac(W) ) )
  }
where
sphere(p)
is unit sphere of the right dimension. Of course, an implementation of this function would need to attend to many additional details.

To the extent that the sphere is isotropic, the generated ellipse is independent of the choice of factor for W.

If the sphere object is augmented with diameters along usual axes, then the image of the diameters will be columns of A\! which will form a set of conjugate vectors of the ellipse. A set of conjugate vectors of an ellipse has the property that each vector is on the surface of the ellipse and any subset of vectors spans a space that is parallel to the tangent spaces at any of the other vectors. The set of linear combinations of the vectors with coefficients in [-1,1] \! is a parallelogram that contains the ellipse and whose faces are tangent spaces to the ellipse.

With two symmetric matrices, H\! and E\!, with E\! positive definite, we can find a factor A\! of E\! whose columns will also be conjugate directions for H\!. The following function will produce an 'eigenvector' based right factor of E\! if called with a single argument. With two matrix arguments, it returns a right factor of E\! whose rows are also conjugate directions for H\!. With two matrix arguments, the first argument must be positive definite.


 fac <- function( 
           E,           #  non-neg. def. matrix to factor -- should be pos. def. if H is not missing
           H,           #  if not missing, factor of E will consist of axes conjugate relative to H also
           which = 1    #  which = 2 returns a factor of H
   )
   {            
       xx <- eigen(E,symmetric = T)
       Efac <- t(xx$vector) * sqrt( pmax( Re( xx$values ), 0 ))  # right factor of E  
       if ( missing(H) ) return(Efac)
       Efac.qr <- qr( t( Efac ) )
       H.star <- qr.coef( Efac.qr, t( qr.coef( Efac.qr, H))) # Efac^{-1} H Efac'^{-1}
       ev <- eigen( H.star, symmetric = T )
       ret <- t(ev$vectors) %*% Efac
       if ( which == 2 ) ret <- ret * ev$values        
       attr( ret, "lambda" ) <- ev$values
       ret
   }

Canonical transformations

See Statistics: Canonical transformations for HEplots

Kissing ellipsoids

See Statistics: Kissing ellipsoids