# MATH 6643: Assignment 3

This assignment has a number of parts

## 1. Relationships among models

A visual presentation of the material in this part of the assignment can be found in Mixed Models Mixed Models II - Behind the Scenes Report (http://www.math.yorku.ca/%7Egeorges/Slides/HandoutM.pdf)

Suppose a hierarchical model can be expressed using the Laird-Ware notation for each cluster as

Yi = Xiγ + Ziui + εi

where

Xi is $n_i \times p$ , $u_i \sim N(0, T)$ , $\epsilon_i \sim N(0, \sigma^2 I)$ with the usual independence assumptions.

We can also refer to the model for the full data using the usual notation:

Y = Xγ + Zu + ε

With this data we could fit a number of models:

### Marginal model

This is the usual OLS model ignoring clusters:
$\hat{\gamma}^M = \begin{bmatrix} \hat{\alpha}^M \\ \hat{\beta}^M \end{bmatrix} = (X'X)^{-1} X'Y$
with $\hat{\beta}^M = (X^*{}' QX^*)^{-1} X^*{}'Q Y = (X^C{}' X^C)^{-1}X^C{}'Y^C$
where $X = \begin{bmatrix} 1 & X^* \end{bmatrix}$, Q is the matrix of the orthogonal projection into the overall residual space, XC = QX * is the 'centered' X * matrix, and YC = QY.

### Between cluster model

This is a model fitted to cluster averages, sometimes referred to as an 'ecological' model. If we weigh each average by the number of points sampled in the cluster we get:
$\hat{\gamma}^B = \begin{bmatrix} \hat{\alpha}^B \\ \hat{\beta}^B \end{bmatrix} = (X'P_GX)^{-1} X'P_GY$
where PG is the matrix of the projection into the cluster mean space.
Note that the matrix of the orthogonal projection into cluster residual space is QG = IPG and that $\hat{\beta}^B = (X^C{}' P_G X^C)^{-1} X^C{}'P_GY^C$.

This is the 'fixed effects' model with a categorical variable for cluster. [Caution: in Mixed Models Mixed Models II - Behind the Scenes Report (http://www.math.yorku.ca/%7Egeorges/Slides/HandoutM.pdf) this model is referred to as the 'within' model]
$\hat{\gamma}^A = \begin{bmatrix} \hat{\alpha}_1^A \\ \vdots \\ \hat{\alpha}_m^A \\ \hat{\beta}^A \end{bmatrix} = \begin{bmatrix} \hat{\alpha}^A \\ \hat{\beta}^A \end{bmatrix} = \left( \begin{bmatrix} D' \\ X^*{}' \end{bmatrix} \begin{bmatrix} D & X^*{} \end{bmatrix} \right)^{-1} \begin{bmatrix} D' \\ X^*{}' \end{bmatrix}Y$
where D is the matrix of cluster intercept terms,
$D= \begin{bmatrix} 1_{n_1} & \cdots & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & & \vdots \\ 0 & \cdots & 1_{n_i} & \cdots & 0 \\ \vdots & & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & \cdots & 1_{n_m} \end{bmatrix}$
Note that PG = D(D'D) − 1D' and that
 $\hat{\beta}^A$ = (X * 'QGX * ) − 1X * 'QGY = (XC'QGXC) − 1XC'QGYC
The last equality follows from the fact that QG = QQG.
Note that in the case of a random intercept model D = Z.

### Within cluster models

Within each cluster:
$(X'_i X_i)\hat{\gamma}_i^W = X'_iY_i$
and if the model is of full column rank
$\hat{\gamma}_i^W = (X'_i X_i)^{-1} X'_iY_i$
and
 $\hat{\beta}_i^W$ $= (X_i^*{}' Q_i X_i^*)^{-1} X_i^*{}' Q_i Y_i$ $= (X_i^C{}' Q_i X_i^C)^{-1} X_i^C{}' Q_i Y^C_i$
since $Q_iX_i = Q_iX^C_i$ and $Q_iY_i = Q_iY^C_i$ where Qi is the matrix of the residual projection in the i-th cluster.

### Mixed model

We have
 Y $= X\hat{\gamma}^{GLS} + d$ where X'(Var(Y)) − 1d = 0 $= 1 \hat{\alpha}^{GLS} + X^* \hat{\beta}^{GLS} + d$ $= 1 \hat{\alpha}^{GLS} + PX^* \hat{\beta}^{GLS} + QX^* \hat{\beta}^{GLS}+ d$ $= 1 \hat{\alpha}^* + QX^* \hat{\beta}^{GLS}+ d$ since $PX^* \hat{\beta}^{GLS}$ is constant $= 1 \hat{\alpha}^* + X^C \hat{\beta}^{GLS}+ d$
We then verify that $\hat{\beta}^{GLS}$ is the GLS regression slope with respect to XC by checking the generalized orthogonality requirement on d, i.e.
$\begin{bmatrix} 1' \\ X^C{}' \end{bmatrix} (Var(Y))^{-1} d = 0$ which follows from the the fact that $\mathrm{span}(X^C) \subset \mathrm{span}(X)$.
Now, assuming a random intercept model with equal clusters, we have (Var(Y)) − 1 = cW with $W = I - \frac{n}{\tau^2/\sigma^2 + n } P_G = I - k P_G$ and the orthogonality condition is X'Wd = 0.
From the last line above, we can easily show that
$Y^C = 1 \hat{\alpha}^{**} + X^C \hat{\beta}^{GLS} + d$
Premultiplying by XC'W we get:
$X^C{}'WY^C = X^C{}'W1 \hat{\alpha}^{**} + X^C{}'WX^C \hat{\beta}^{GLS} +X^C{}'W d$
in which $X^C{}'W1 \hat{\alpha}^{**} = 0$ and XC'Wd = 0. Thus
$X^C{}'WX^C \hat{\beta}^{GLS} = X^C{}'WY^C$
and
$\hat{\beta}^{GLS} = (X^C{}'WX^C)^{-1} X^C{}'WY^C$

### Problems

1. By definition, G is a generalized inverse of X if XGX = X. A generalized inverse of X is sometimes denoted X although the generalized inverse is not unique unless X is a square invertible [i.e. non-singular] matrix in which case the generalized inverse is the ordinary inverse. It is fascinating to explore the algebraic and geometric properties of generalized inverses but we limit ourselves to a few useful facts:

a)Suppose that X is of full column rank. Show that
G = (X'X) − 1X'
is a generalized inverse of X.

b) Show that for any X, if GX = I then G is a generalized inverse of X. [Note that no such G will exist unless X is of full column rank.]

c) If X is of full column rank and A is non-singular then G = (X'AX) − 1X'A exists and is a generalized inverse of X.

d) If A − 1 = (ZΔZ' + I) with Δ positive definite, then A = IZ − 1 + Z'Z) − 1Z'

e) If A − 1 = (δ2ZZ' + I) where Z is the random effects design matrix for a random intercept model with equal cluster size n, then $A = I - \frac{n\delta^2}{1+n\delta^2}P_G$ where PG = Z(Z'Z) − 1Z' is the matrix of the orthogonal projection producing group means.

f) In general (X'AX) − 1X'A will not be equal to (X'X) − 1X'. There is however a special case where they are equal. Let A satisfy A − 1 = ZiTZi' + σ2I where $\mathrm{span}(Z_i) \subset \mathrm{span} ( X_i )$. Show that, in this case, (Xi'AXi) − 1Xi'A = (Xi'Xi) − 1Xi'. [Hint: one way to show that two matrices are equal is to show that they have the same value over a basis for their domain. It is easy to show that the two matrices have the same value on span(Xi). If you can also show that they have the same value on the orthogonal complement of span(Xi), $\mathrm{span}^{\perp}(X_i)$, then the two matrices have the same value on their domain.]

2. Prove the assertion made in connection with the marginal model: If
$\hat{\gamma}^M = \begin{bmatrix} \hat{\alpha}^M \\ \hat{\beta}^M \end{bmatrix} = (X'X)^{-1} X'Y$,
and $X = \begin{bmatrix} 1 & X^* \end{bmatrix}$, then
$\hat{\beta}^M = (X^*{}' QX^*)^{-1} X^*{}'Q Y$
where and Q is the matrix of the orthogonal projection into the overall residual space, i.e. Q = I − 1(1'1) − 11'. Put differently
$\hat{\beta}^M = (X^C{}' X^C)^{-1} X^C{}'Y^C$
where XC and YC are the X * matrix and respectively the Y vector centered at each column mean. This is part of the Frisch-Waugh-Lovell Theorem as it is known by econometricians or, alternatively, the 'added-variable plot' theorem.

3. Assume that each Xi is of full column rank (this is an assumption that can be relaxed). Consider using the fixed effects model precision weights for $\hat{\gamma}_i^W$ (i.e. $(Var( \hat{\gamma}_i^W))^{-1}$ under the fixed model. Show that the precision-weighted combination of the within-cluster estimated coefficients, $\hat{\gamma}_i^W$ yields the marginal estimated coefficients, $\hat{\gamma}^M$.

4. Assume that each Xi is of full column rank. Consider using the fixed effects model precision weights for slopes $\hat{\beta}_i^W$ (i.e. $(Var( \hat{\beta}_i^W))^{-1}$ under the fixed model. Show that the precision-weighted combination of the within-cluster estimated slopes, $\hat{\beta}_i^W$ yields the cluster-adjusted slopes, $\hat{\beta}^A$.

5. Assume that each Xi if of full column rank. Consider using the mixed model precision weights (i.e. $(Var( \hat{\gamma}_i^W))^{-1}$ under the mixed model. Show that the precision-weighted combination of the within-cluster estimated coefficients, $\hat{\gamma}_i^W$ yields the GLS mixed model estimated coefficients, $\hat{\gamma}^{GLS}$.

6. Assume a random intercept model with clusters of equal size n. Let Var(ui) = τ2 so that Var(u) = τ2I. Note that τ2 corresponds to τ00 in other notes. Show that
$(X'X - k X'P_GX)\hat{\gamma}^{GLS} = X'X \hat{\gamma}^M - kX'P_GX \hat{\gamma}^B$
where $k = \frac{n}{ \tau^2/\sigma^2 + n }$.

7. Treating the matrices involved in the formula in #6 as if they were scalars, explain the sense in which $\hat{\gamma}^{GLS}$ is an 'extrapolation' of $\hat{\gamma}^B$ in the direction of and beyond $\hat{\gamma}^M$.

8. Prove that, for X of full column rank, the conditions Y = XγM + e with X'e = 0 determine the OLS fit uniquely.

9. Prove that, for X of full column rank, the conditions $Y = X\hat{\gamma}^{GLS} + d$ with X'Wd = 0 determine the GLS fit uniquely where W = cVar(Y) − 1 for any c > 0.

10. Show that for a random intercept model with clusters of equal size, $\hat{\beta}^{GLS}$ remains the same if X * is replaced with Xc = QX * , i.e. the residuals of columns of X from the grand mean of each column.

11. For a random intercept model with clusters of equal size, express $\hat{\beta}^{GLS}$ as a weighted combination of $\hat{\beta}^A$ and $\hat{\beta}^B$.

12. For a random intercept model with clusters of equal size, express $\hat{\beta}^{GLS}$ as a weighted combination of $\hat{\beta}^A$ and $\hat{\beta}^M$.

13. For a random intercept model with clusters of equal size, show that the model that includes contextual variables and residuals from contextual variables yields the 'between' cluster effect as the slope for the contextual variables and the 'cluster-adjusted' effect as the slope for the residuals from the contextual variables.

## 2. Textbook problems

### McCulloch and Searle, Chapter 2

Questions on normal responses

2.1, 2.2, 2.3, 2.5, 2.9, 2.10

### McCulloch and Searle, Chapter 3

Questions on normal responses

3.3, 3.4, 3.7, 3.11

## 3. Fitting 'Basic' Mixed Models Using SAS and R

The purpose of this exercise is to analyze the data described in the course using mixed models in SAS and R. We will analyze the “hs” data set which is a subsample of 40 schools from the data set reported in Bryk and Raudenbush (1992). Fairly detailed SAS code is given below. Many SAS analyses (but not all) can carried be out with 'lme' in 'library(nlme)' in R.

• SAS:
In SAS use
      LIBNAME MIXED 'name of directory where you stored hs.sd1';
DATA HS;
SET MIXED.HS;
PROC SORT DATA = HS;
BY SCHOOL;  /* to be safe because SAS thinks it's a new school when the id changes */
RUN;

In the rest of the SAS job you can refer to the data set as 'HS'.

• R:
Create a data frame with:
       > hs <- read.csv( file.choose() )   # run and navigate to select the file you downloaded

You can also load the data frame directly from the internet:
       > hs <- read.csv("http://www.math.yorku.ca/~georges/Data/hs.csv")


0. Summarize the data using SAS or R:
Some basic code in SAS:
       proc contents data = hs;
run;
proc summary data=hs print min q1 median q3 max;
var school minority female ses mathach size sector meanses;
run;

Are there other effective ways of summarizing data in SAS?
Is there any equivalent to 'xqplot' in R:
       > source("http://www.math.yorku.ca/~georges/R/fun.R")
> td ( new = T )
> xqplot ( hs )


1. Null model (fixed intercept and random intercept)

SAS code:
       PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = / DDFM = SATTERTH ;
RANDOM INTERCEPT / SUBJECT = SCHOOL
TYPE=FA0(1) G S; /* Note that the '0' is a ZERO */
RUN;

What benefit might there be from using TYPE=FA0(1) instead of TYPE=UN?
What is the estimated intraclass correlation using this model?
Run the corresponding problem in R and compare output. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

2. Means as outcomes regression

      PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = SECTOR / S DDFM=SATTERTH;
RANDOM INTERCEPT / SUBJECT = SCHOOL TYPE = FA0(1) ;
RUN;

Is the effect of sector significant? Can we include sector in the random statement? Why?
Run the corresponding analysis in R and compare output. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

3. One-way ANCOVA with random effects

      PROC MIXED;
CLASS SCHOOL;
MODEL MATHACH = SES / S DDFM = SATTERTH;
RANDOM INTERCEPT / SUBJECT = SCHOOL TYPE = FA0(1);
RUN;

Is the effect of SES significant?
Run the corresponding analysis in R and compare output. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

4. Random coefficients model -- compare effect of different parametrizations of G

      PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = SES / S DDFM = SATTERTH;
RANDOM INTERCEPT SES / TYPE = VC /* default */
SUBJECT = SCHOOL G GC;
RUN;

PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = SES / S;
RANDOM INTERCEPT SES / TYPE = UN
SUBJECT = SCHOOL G GC;
RUN;

PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = SES / S;
RANDOM INTERCEPT SES / TYPE = FA0(2)
SUBJECT = SCHOOL G GC;
RUN;

How do the three models above differ?
How do they compare with a correponding model in R? Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

5. Intercepts and slopes as outcomes

      PROC MIXED ASYCOV CL;
CLASS SCHOOL;
MODEL MATHACH = SES SECTOR SES*SECTOR / S DDFM = SATTERTH;
RANDOM INTERCEPT SES /
/* try various options for TYPE */
SUBJECT = SCHOOL G GC;
RUN;

Compare SAS results with the corresponding analysis in R. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

6. Nonrandom slopes

Specify the SAS codes to analyze this model and compare with the corresponding analysis in R.

7. Using the general linear hypothesis (SAS ESTIMATE statement) to test plausible questions:

Consider the following mixed model:
       PROC MIXED ASYCOV ASYCORR CL;
CLASS SCHOOL;
MODEL MATHACH = SES SECTOR FEMALE SES*SECTOR /
S DDFM = SATTERTH;
RANDOM SES FEMALE INT / SUBJECT = SCHOOL
TYPE = FA0(3) G GC;
ESTIMATE ‘Cath – Pub | low SES’ SECTOR 1 SES*SECTOR -2 / CL E;
ESTIMATE ‘Cath – Pub | high SES’ SECTOR 1 SES*SECTOR +2 / CL E;
RUN;

How are the above ESTIMATE statements obtained and what does each test?
Compare the ESTIMATE statement in SAS with the output of 'glh' in R.

8. Create your own SAS and R code to test the relationship between MATHACH and MINORITY and SES.

Would a poor student who belongs to a minority appear better off going to a catholic or a public school? Perform a similar test for a rich student.
Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

## 4. An experiment with mixed models

This exercise is meant to address two questions:

1. What do estimators really estimate?
2. Does variance parametrization matter?

using a data set with 4 patients each observed at 3 doses of an effective anti-depressant:

ys is a highly reliable measure of depression
yb is a less reliable measure

This was meant to illustrate Simpson's/Robinson's paradox but it also turned out to be a good illustration of convergence problems

As you work on the exercise, fill out a copy of following table:

 "Effect" of Dose Small sigma Big sigma Marginal (Pooled) Between Within (Ajusted for patient) Mixed TYPE=UN Mixed TYPE=FA0(1)

Then sketch your results on two number lines.

          data expt;
input sub dose ys yb;
cards;
1 1 3.1 3.1
1 2 2.2 3.9
1 3 1.1 1.1
2 3 5.0 5.0
2 4 4.1 4.9
2 5 3.0 3.0
3 5 7.0 7.0
3 6 6.1 6.9
3 7 5.0 5.0
4 7 9.1 9.1
4 8 8.2 9.9
4 9 7.1 7.1
run;

          proc gplot data = expt;
symbol1 i=join v=plus;
symbol2 i=join v=plus;
symbol3 i=join v=plus;
symbol4 i=join v=plus;
plot ys*dose = sub;
plot yb*dose = sub;
run;

          /* Within Subject model */
/* small sigma */

          proc glm data=expt;
class sub;
model ys = dose sub / solution;
run;

/* large sigma */
proc glm data=expt;
class sub;
model yb = dose sub / solution;
run;
/* The following commands fit the "pooled" and
"between" models for ys.
When you fit mixed models experiment with
different TYPE options in the RANDOM statement.
In particular, try TYPE = UN and TYPE = FA0(q) where q
is the number of random parameters.
Ponder your results and their implications for the
perils of false convergence.
Repeat using yb and compare your results.
*/
/* First: compute subject averages and deviations
for ys and dose */
proc summary data=expt nway;
class sub;
var ys yb dose;
output out=means
mean=ys_b yb_b dose_b ;
proc print;
data expt;
merge expt means;
by sub;
ys_w = ys - ys_b;
yb_w = yb - yb_b;
dose_w = dose - dose_b;
/* note that we only need dose_w and dose_b
for mixed models */
run;
proc print data=expt;
run;
/* between subject model */
proc glm data=expt;
class sub;
model ys_b = dose_b / solution ;
run;
/* pooled model */
proc glm data=expt;
class sub;
model ys = dose;
run;
/* proc mixed estimate */
proc mixed data=expt asycov asycorr;
class sub;
model ys = dose / solution ;
random int / sub=sub type = fa0(1);
/* be sure to check the estimated of tau.
Does it make sense?
Then re-run with TYPE = FA0(1).
Note that this is exactly the SAME model, just
a different parametrization.Why do you think
it makes a difference?*/
run;
/* After experimenting with the model above, try
controlling for the patient average dose:*/
proc mixed data=expt asycov asycorr;
class sub;
model ys = dose_w dose_b / solution ddfm=satterth;
random int / sub=sub type=un;
/* then try TYPE = FA0(1) */
estimate 'between vs. within' dose 1 dose_b -1;
run;

/* Also try a random effect of dose: */

proc mixed data=expt asycov asycorr
method=reml; /* try ml also */
class sub;
model yb = one dose_w dose_b / noint solution;
random int dose_w / sub=sub type=fa0(2) g gc;
run;

/* Now see what happens if you use dose instead of dose_w:
Which effects change? Why?
Experiment with the PARMS statement to provide starting
values to see if you can get the model to converge to
a sensible estimate although using TYPE = UN


## 5. An experiment with mixed models using R

Carry out the previous exercise using R. In R, there is no distinction between "TYPE=UN" and "TYPE=FA0(1)". The random effects variance parametrization in R is a log-Choleski parametrization similar but not identical to "FA0(1)".

Using trellis graphics or otherwise, plot OLS fits for each patient, BLUPs and population-level fits.

## 6. Fitting Mixed Models with Contextual Effects Using SAS and/or R

a) Analyze the HS data focusing on the relationship between Math Achievement, sector, SES and gender including possible contextual effects of SES and gender. Be careful using a contextual effect of gender. Would it make sense to turn it into a categorical variable? Explore the relationship between SES, sector, gender and Math Achievement and explain your conclusions. Use appropriate graphs. Address some specific questions:

Does a low SES child appear to do better in a low SES school or in a high SES school? Is this different for boys and girls?
How about a high SES child?
Do girls seem better off in co-ed schools or in girl schools?
Discuss limitations in the interpretation of these results.

b) Analyze the HS data focusing on the relationship between Math Achievement, sector, SES and minority status including possible contextual effects of SES and minority status. Explore the relationship between SES, sector, minority status and Math Achievement and explain your conclusions. Use appropriate graphs.

Does a low SES child appear to do better in a low SES school or in a high SES school? Does this depend on minority status?
How about a high SES child?
Do minority students appear to do better in schools with a high proportion of minority students?
Discuss limitations in the interpretation of these results.