Statistics: Visualizing projections

From MathWiki

Suppose that data on variables represented by Y = \begin{bmatrix} y_1 \\ \vdots \\y_p \end{bmatrix}  \in \mathbb{R}^p are viewed through a transformation V = TY where T is v \times p.

How can the values of the original variables be indicated in the plot of V?

Some confusion may arise from the fact that the unit axis vectors, e1,...,ep happen to serve many purposes because of their orthonormality. The can be thought of as:

  1. a basis: any Y has a unique expression as Y = y1e1 + ... + ypep,
  2. a set of evaluation functionals: yi = e'iY,
  3. a set of evaluation axes: ei lies on the line determined by the evaluation functional and the value of the functional is 1 at ei.

Note that we can visualize the coordinates associated with a basis by using the parallelogram rule to express Y as a sum of vectors each lying in one of the axes formed by each basis vector.

To use an evaluation functional, we project Y perpendicularly into the line through ei. The 'evaluation axis' identifies the position of the unit tick mark along this line.

With orthonormal vectors, the same set plays all three roles. If coordinates are expressed in terms of a basis that is not orthonormal, then different sets of vectors are needed for each role.

If V = TY with a non-singular T then we have the following:

  1. basis: to display a basis with respect to which the Y coefficients of a point can be visualized using the 'parallelogram law' we would need to plot the columns of T = [t_1 \cdots t_p] since V = TY. Note that using the parallogram law requires visualizing all basis vectors.
  2. evaluation functionals: yi = e'iT − 1V. Thus the evaluation functionals are the rows of T − 1.
  3. evaluation axes: Letting ti be the ith row of T − 1, the evaluation axis ai' = cti where c is chosen so that tiai = cti(ti)' = 1. Thus ai' = (titi') − 1ti.

When v < p

When v < p there is some loss of information and the values of each yi may not be available. If there is a distribution for Y, a natural approach would be based on E(Y | V). If Y \sim N_p(0,\Sigma)then

\begin{bmatrix} V \\ Y \end{bmatrix} \sim N_{v+p}\left( 0 , \begin{bmatrix} T\Sigma T' & T \Sigma \\                  \Sigma T' & \Sigma \end{bmatrix} \right),

and E(yi | V) = e'iΣT'(TΣT') − 1V.

Thus, the prediction functionals are rows of

F = ΣT'(TΣT') − 1

Note that if v = p, we get the evaluation functionals above.

It there is no information on the distribution of Y, it may be reasonable to treat the distribution as diffuse and spherically symmetric. In this case the functionals are rows of T'(TT') − 1.

The prediction axes are the rows of A = CF = CΣT'(TΣT') − 1 where C is diagonal so that \operatorname{diag}(F A') = I.

Thus

A = [\operatorname{diag}(FF')]^{-1} F
= [\operatorname{diag}(\Sigma T' (T \Sigma T' )^{-2} T \Sigma)]^{-1} \Sigma T' (T \Sigma T')^{-1}


Canonical transformations

Suppose Σ = AA' with A = \begin{bmatrix} A_1 & A_2 \end{bmatrix}. Let T = A1 where A^{-1} =  \begin{bmatrix} A^1 \\ A^2 \end{bmatrix}.

With this transformation, \operatorname{Var}(V) = I and the prediction functionals are rows of:

F = ΣT'(TΣT') − 1
= \begin{bmatrix} A_1 & A_2 \end{bmatrix}  \begin{bmatrix} A'_1 \\ A'_2 \end{bmatrix}      {A^1}'  \left\{ A^1  \begin{bmatrix} A_1 & A_2 \end{bmatrix}  \begin{bmatrix} A'_1 \\ A'_2 \end{bmatrix}   {A^1}' \right\} ^{-1}
= \begin{bmatrix} A_1 & A_2 \end{bmatrix}  \begin{bmatrix} I \\ 0 \end{bmatrix}        \left\{ \begin{bmatrix} I & 0 \end{bmatrix}  \begin{bmatrix} I \\ 0 \end{bmatrix} \right\} ^{-1}
= A1

and the prediction axes are the rows of [\operatorname{diag}( A_1 A_1' )]^{-1}A_1.

Orthogonal transformations

Suppose that T is semi-orthogonal (TT' = I). The images of axis unit vectors are columns of T (i.e. images of the unit axis vectors in \mathbb{R}^p which are also the prediction functionals with respect to a spherical distribution for Y since

F = T'(TT') − 1 = T'

The evaluation axes are:

A = [\operatorname{diag} (T'T)]^{-1} T'