Team Gini

From MathWiki

Hi, dear team members: I am very glad that we are in one group. Could I separate this assignment to each of us roughly. And we can help each other too.

Jing: section 1: 1(d), section 3: 1, section 5 and section 6 (b)
Jinling: section 1: 1(b), 3, 5, 7, 9.
Mary: section 1: 1(f), 11, section 2: 2.2, 2.5, 2.10.
Wei: Section 2: 3.3, 3.7, section 3: 3, 5, 7.
Jinling
Table of contents

Section 1: 1(b), 1(d), 1(f), 3, 5, 7, 9, 11

1. b) Show that for any X, if GX = I then G is a generalized inverse of X. [Note that no such G will exist unless X is of full column rank.]

Proof:

Theorem (Moore 1920): G is a generalized inverse of X since XGX = X.

Now, X (mxn) is a matrix with full column rank n, if GX = I, G muxt be nxm.

Then XGX = X(GX) = XI = X.

According to Moore's theorem, G is a generalized inverse of X.


Mary 01:20, 12 Jun 2006 (EDT)


1(b)

Since GX = I

then XGX = X(GX) by theorem

= XI = X

so XGX = X

G is a generalized inverse of X by definition.


GM comment: It's good practice to reserve the notation X − 1 for a real inverses and to use X for generalized inverses GM comment: All matrices have a generalized inverse, often many in fact. But having a generalized inverse does NOT mean that a matrix is invertible. See the discussion page Talk:Team Gini for an easy counterexample

By Jinling


d) If A − 1 = (ZΔZ' + I) with Δ positive definite, then A = IZ − 1 + Z'Z) − 1Z'

solution: since Δ positive definite, A = (ZΔZ' + I) − 1

Using the Sherman-Morrison identity: (A + UDV) − 1 = A − 1A − 1U(D − 1 + VA − 1U) − 1VA − 1

we can get A = (I) − 1 − (I) − 1Z − 1 + Z(I) − 1Z') − 1Z'(I) − 1 = IZ − 1 + Z'Z) − 1Z'

By Jing


f) In general (X'AX) − 1X'A will not be equal to (X'X) − 1X'. There is however a special case where they are equal. Let A satisfy A − 1 = ZiTZi' + σ2I where \mathrm{span}(Z_i) \subset \mathrm{span} ( X_i ). Show that, in this case, (Xi'AXi) − 1Xi'A = (Xi'Xi) − 1Xi'.

[Hint: one way to show that two matrices are equal is to show that they have the same value over a basis for their domain. It is easy to show that the two matrices have the same value on span(Xi). If you can also show that they have the same value on the orthogonal complement of span(Xi), \mathrm{span}^{\perp}(X_i), then the two matrices have the same value on their domain.]

Proof:

The span of a set of vectors is the set of all their linear combinations.

First, show that the two matrices have the same value on span(Xi).

(Xi'AXi) − 1Xi'AXi = I, (Xi'Xi) − 1Xi'Xi = I. So (Xi'AXi) − 1Xi'AXi = (Xi'Xi) − 1Xi'Xi.

Then, show that they have the same value on the orthogonal complement of span(Xi), \mathrm{span}^{\perp}(X_i).

Since A satisfy A − 1 = ZiTZi' + σ2I where \mathrm{span}(Z_i) \subset \mathrm{span} ( X_i ), we can get A = σ − 2I − σ − 4Zi(T − 1 + σ − 2Zi'Zi) − 1Zi'.

Let Y_i {\perp}X_i. (Xi'AXi) − 1Xi'AYi = (Xi'AXi) − 1σ − 2Xi'IYi − (Xi'AXi) − 1σ − 4Xi'Zi(T − 1 + σ − 2Zi'Zi) − 1Zi'Yi = 0. Here we have \mathrm{span}(Z_i) \subset \mathrm{span}.

And (Xi'Xi) − 1Xi'Yi = 0. Therefore, (Xi'AXi) − 1Xi'AYi = (Xi'Xi) − 1Xi'Yi.

According to the hint, the two matrices have the same value on their domain.


Mary 05:05, 12 Jun 2006 (EDT)


3.Solution: The fixed effects model precision weights for \hat{\gamma_i}^{W} under the fixed model:

\hat{\gamma_i}^{W}= (X_i'X_i)^{-1}X_i'Y_i

Var(\hat{\gamma_i}^{W}))= (X_i'X_i)^{-1}{\sigma^2}

\hat{\gamma_i}^{W}/Var(\hat{\gamma_i}^{W}) = (X_i'X_i)^{-1}X_i'Y_i /(X_i'X_i)^{-1}{\sigma^2}= X_i'Y_i/{\sigma^2}

The precision-weighted combination of the within-cluster estimated coefficients are:

\sum(\hat{\gamma_i}^{W}/Var(\hat{\gamma_i}^{W})) / \sum{(Var(\hat{\gamma_i}^{W}))^{-1}})

= \sum{X_i' Y_i {\sigma^{-2}}}/ \sum(((X_i'X_i)^{-1}{\sigma^2})^{-1})= \sum{X_i' Y_i}/ \sum(X_i'X_i)=(X'X)^{-1}X'Y

The marginal estimated coefficients:

\hat{\gamma^{M}}= (X'X)^{-1} X'Y = \hat{\gamma_i}^{W}/Var(\hat{\gamma_i}^{W}) \sum{(Var(\hat{\gamma_i}^{W}))^{-1}})

By Jinling


5.Solution: Assume that each Xi if of full column rank. The mixed model:

Yi = Xiγ + ZUi + εi

Var(\hat{\gamma_i})= {\sigma}^{2} (X'X)^{-1} + {\Tau}

W_i = 1/Var(\hat{\gamma_i})

the within-cluster estimated coefficients are:

\hat{\gamma}^{W} =(X_i'X_i)^{-1} X_i'Y_i

The precision-weighted combination of the within-cluster estimated coefficients are:

(\sum{W_i})^{-1} \sum{{\hat{\gamma_i}}^{W}} W_i = \sum{(X_i'X_i)^{-1}X_i'W_iY_i }/\sum(W_i)=(X'WX)^{-1} X'WY

The GLS mixed model estimated coefficients are:

\hat{\gamma^{GLS}}=(X'WX)^{-1} X'WY= (\sum{W_i})^{-1} \sum{{\hat{\gamma_i}}^{W}} W_i

By Jinling


7.solution

In questuion # 6,

(X'X-kX'P_G X)\hat{\gamma}^{GLS}=X'X \hat{\gamma}^M-kX'P_GX\hat{\gamma}^B

Now we treat the matrices involved in the formula in # 6 as if they were scalars, then

\hat{\gamma}^{GLS}=((X'X-kX'P_G X))^{-1}(X'X \hat{\gamma}^M - kX'P_GX\hat{\gamma}^B)=\frac{a1}{a1+a2}\hat{\gamma}^M + \frac{a2}{a1+a2}\hat{\gamma}^B

We can think \hat{\gamma}^{GLS} is a linear combination of \hat{\gamma}^M and \hat{\gamma}^B and where a1 = ((X'XkX'PGX)) − 1X'X ,a2 = − ((X'XkX'PGX)) − 1kX'PGX are constants. Then \hat{\gamma}^{GLS} can be thought as two parts: one is along the direction of \hat{\gamma}^B, the magnitude is \frac{a2}{a1+a2}\hat{\gamma}^B =(1- \frac{a1}{a1+a2})\hat{\gamma}^B, another one is along the direction of \hat{\gamma}^M, the magnitude is \frac{a1}{a1+a2}\hat{\gamma}^M. So, if \frac{a1}{a1+a2} is large, namely ((X'XkX'PGX)) − 1X'X is large, then \hat{\gamma}^{GLS} tends to the direction of \hat{\gamma}^M. If \frac{a1}{a1+a2} is small, namely ((X'XkX'PGX)) − 1X'X is small, then \hat{\gamma}^{GLS} tends to the direction of \hat{\gamma}^B.


By Jinling


9.Solution: For X of full column rank, We have Y=X \hat{\gamma}^{GLS} + d

\hat{\gamma}^{GLS} = (X'WX)^{-1} XWY

where X'Wd = 0 , W = cVar(Y) − 1

Suppose we have another GLS with Y=X \hat{\gamma}^* + d where X'Wd = 0

then d = Y - X \hat{\gamma}^*

and X'Wd = X'W (Y - X \hat{\gamma}^*) = X'WY - X'W X\hat{\gamma}^* = 0

X'W X \hat{\gamma}^* = X'WY

\hat{\gamma}^* = (X'WX)^{-1} XWY = \hat{\gamma}^{GLS}

So the GLS fits uniquely for any c > 0.

By Jinling


11. For a random intercept model with clusters of equal size, express \hat{\beta}^{GLS} as a weighted combination of \hat{\beta}^A and \hat{\beta}^B.

Solution:

\hat{\beta}^{GLS} = Q_G \hat{\beta}^A + P_G \hat{\beta}^B

\hat{\beta}^{GLS} = (X^C{}'WX^C)^{-1} X^C{}'WY^C, W = I - \frac{n}{\tau^2/\sigma^2 + n } P_G = I - k P_G

\hat{\beta}^A = (X^C{}' Q_G X^C)^{-1} X^C{}' Q_G Y^C, Q_G = I - P_G

\hat{\beta}^B = (X^C{}' P_G X^C)^{-1} X^C{}'P_GY^C, P_G = Z(Z'Z)^{-1}Z'

We can write the GLS model as: Y =  1 \hat{\alpha} + Q_G X^* \hat{\beta}^A + P_G X^*\hat{\beta}^B + e, where e \perp L(1,P_G X^*, Q_G X^*)

In order to show this is GLS solution, we need to show \begin{bmatrix} 1'\\P_GX^*\\Q_GX^*\end{bmatrix} W e = 0.

First, 1'We = 1'(IkPG)e = (1' − k1)e = c1'e = 0.

Second, X * 'PGWe = X * 'PG(IkPG)e = X * 'PGekX * 'PGe = 0 − k0 = 0.

Finally, X * 'QGWe = X * 'QG(IkPG)e = X * 'QGekX * 'PGe = 0 − k0 = 0.


Mary 05:32, 12 Jun 2006 (EDT)

Section 2: 2.2, 2.5, 2.10, 3.3, 3.7

2.2 Solution

In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. Exact distribution theory based on the F-distribution is to reject H0 when -2log \wedge\;\ge\; N log (1+\frac{m-1}{N-m}F_{m-1,N-m,1-a}).

Asymptotic theory tells us that the large-sample distribution of -2log \wedge\; under H0 is chi-square with degrees of freedom equal to the difference in the number of parameters in the parameter space and the number under H0. The large-sample test would therefore be to reject H0 when -2log \wedge\;\ge\;\chi_{m-1, 1-\alpha}^2

If the asymptotic critical value is used instead of the exact critical value, no matter N=20, 50 or 100, the significance level is same: \chi_{4, 1-\alpha}^2


Mary 15:20, 15 Jun 2006 (EDT)


2.5 Solution

(a)Var(\bar{y}_{i.})= \sigma_a^2+\sigma^2/n_i

As we know, solutions \dot{\mu}, \dot{\sigma^2}, \dot{\sigma_a^2} are ML estimators only if the triplet(\dot{\mu}, \dot{\sigma^2}, \dot{\sigma_a^2}) is in the 3-space of (\mu, \sigma^2, \sigma_a^2). And in ensuring that this is achieved, we need to consider the negativity problem. For each data set, equations (2.41), (2.42), and (2.43) have to be solved numerically. When \dot{\sigma_a^2}< 0, we have to let \hat \sigma_a^2 = 0. Therefore, \hat \sigma^2 = \frac{SST}{N}, \hat Var(\bar{y}_{i.})= \frac{\hat \sigma^2}{n_i}.According to the equation (2.41), \hat{\mu} = \frac{\sum_{i} \frac {\bar{y}_{i.}}{\hat Var(\bar{y}_{i.})}}{\sum_{i} \frac{1} {\hat Var(\bar{y}_{i.})} } = \frac{\sum_{i} \frac {\bar{y}_{i.}n_i}{\sigma^2}}{\sum_{i} \frac{n_i} {\sigma^2} } =  \frac{\sum_{i} \bar{y}_{i.}n_i}{\sum_{i} n_i} = \bar {y}..

(b) From (2.43), we get \dot{\sigma_a^2}< 0, but we can't use the negative value as an estimation of any variance. As we mentioned before, we need to consider the negativity problem, so we let \hat \sigma_a^2 = 0. Thus, (2.43) is not used.


Mary 17:23, 15 Jun 2006 (EDT)


2.10 Solution

X and Y are random variables which follow bivariate normal distribution with correlation ρ. Suppose we know X is k standard deviations above its mean, but Y can't be observed. Then we could use that information of X to get a prediction of Y, and the predictor we use is the minimum mean squared error predictor of Y, based on X; it is E[Y|X], the condition mean of Y given X. We can proof that Y|X follows a Normal Distribution with E[Y|X] = \mu _y + \rho \sigma_y \frac{X- \mu_x}{\sigma_x}. Since \frac{X- \mu_x}{\sigma_x} = k, then we get \frac{E(Y|X)- \mu_y}{\sigma_y} = \rho k.

Mary 17:24, 15 Jun 2006 (EDT)

Section 3: 1, 3, 5, 7. Fitting 'Basic' Mixed Models Using SAS and R

1. Null model (fixed intercept and random intercept)

SAS code:

      PROC MIXED ASYCOV CL;
        CLASS SCHOOL;
        MODEL MATHACH = / DDFM = SATTERTH ;
        RANDOM INTERCEPT / SUBJECT = SCHOOL
        TYPE=FA0(1) G S; /* Note that the '0' is a ZERO */
        RUN;

What benefit might there be from using TYPE=FA0(1) instead of TYPE=UN? What is the estimated intraclass correlation using this model? Run the corresponding problem in R and compare output. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

Solution:

(1)The TYPE=UN (unstructured) option is useful for correlated random coefficient models.GM comment: I regret to say that I don't understand this paragraph. Perhaps more context and detail is necessary. /*Sorry George, I don't understand it too. I got the above explanation from SAS Help and Document in SAS software 9.0 package.*/ TYPE=FA0(1) here to request a G estimate that is constrained to be nonnegative definite.G containing variance components in a diagonal structure.In our case, the benefit of using TYPE=FA0(1) is to make variance components in a diagonal struction nonnegative.

(2)The intraclass correlation is a helpful diagnostic tool in determining if a multilevel modeling will be superior to a traditional method, such as regression or ANOVA. A rough rule of thumb is when the intraclass correlation is over .10, hidden clusters are present in your data, and a multilevel modeling model would be a more appropriate data analysis technique. GM comment: Can you provide the reference for this rule of thumb? In what way are the clusters hidden since you must presumably know what they are to compute the intraclass correlation? Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models by Judith D. Singer (http://www.gse.harvard.edu/~faculty/singer)

From the SAS output using TYPE=FA0(1), we can get τ00 = 3.0288 and σ2 = 39.8255 so \rho=\frac{\tau_{00}}{\tau_{00}+\sigma^2}=0.07 Also,from the SAS output using TYPE=UNwe can getτ00 = 9.1814 and σ2 = 39.8248 so \rho=\frac{\tau_{00}}{\tau_{00}+\sigma^2}=0.187 In sum, I think a multilevel modeling is useful in this model.

(3)R code for Null model:

 fit <- lme( mathach ~ 1, data= hs , random = ~ 1 | school )
 summary(fit)
 summary(fit)$coef

R output: GM comment: It's better to include the output in the wiki instead of using an image. That way it's possible to add comments, highlights, etc. To make output line up properly, just enclose it in <pre>, </pre> tags.

 > summary(fit)
Linear mixed-effects model fit by REML
 Data: hs 
       AIC      BIC    logLik
  11317.12 11333.47 -5655.562

Random effects:
 Formula: ~1 | school
        (Intercept) Residual
StdDev:    3.030109 6.310685

Fixed effects: mathach ~ 1 
               Value Std.Error   DF  t-value p-value
(Intercept) 12.85788 0.5048477 1680 25.46882       0

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.04431764 -0.76097490  0.04898448  0.76587527  2.42059025 

Number of Observations: 1720
Number of Groups: 40 

SAS output:

                                         The SAS System       13:33 Wednesday, June 16, 2006   1
                                                                                                
                                     The CONTENTS Procedure                                     
                                                                                                
       Data Set Name        WORK.HS                           Observations          1720        
       Member Type          DATA                              Variables             9           
       Engine               V9                                Indexes               0           
       Created              14:02 Wednesday, June 16, 1993    Observation Length    72          
       Last Modified        14:02 Wednesday, June 16, 1993    Deleted Observations  0           
       Protection                                             Compressed            NO          
       Data Set Type                                          Sorted                YES         
       Label                                                                                    
       Data Representation  WINDOWS                                                             
       Encoding             wlatin1  Western (Windows)                                          
                                                                                                
                                                                                                
                               Engine/Host Dependent Information                                
                                                                                                
Data Set Page Size          8192                                                                
Number of Data Set Pages    16                                                                  
First Data Page             1                                                                   
Max Obs per Page            113                                                                 
Obs in First Data Page      87                                                                  
Number of Data Set Repairs  0                                                                   
File Name                   C:\DOCUME~1\bsun\LOCALS~1\Temp\SAS                                  
                            Temporary Files\_TD1472\hs.sas7bdat                                 
Release Created             9.0000M0                                                            
Host Created                WIN_PRO                                                             
                                                                                                
                                                                                                
                           Alphabetic List of Variables and Attributes                          
                                                                                                
                            #    Variable    Type    Len    Format                              
                                                                                                
                            4    FEMALE      Num       8    BEST15.2                            
                            6    MATHACH     Num       8    BEST15.2                            
                            9    MEANSES     Num       8    BEST15.2                            
                            3    MINORITY    Num       8    BEST15.2                            
                            1    ROWNAMES    Char      4                                        
                            2    SCHOOL      Num       8    BEST15.2                            
                            8    SECTOR      Num       8    BEST15.2                            
                            5    SES         Num       8    BEST15.2                            
                            7    SIZE        Num       8    BEST15.2                            
                                                                                                
                                                                                                
                                       Sort Information                                         
                                                                                                
                                     Sortedby       SCHOOL                                      
                                     Validated      YES                                         
                                     Character Set  ANSI                                        
                                         The SAS System       13:33 Wednesday, June 16, 2006   2
                                                                                                
                                     The SUMMARY Procedure                                      
                                                                                                
                                       Lower                           Upper                    
    Variable         Minimum        Quartile          Median        Quartile         Maximum    
        
    SCHOOL           1317.00         2995.00         5783.00         8202.00         9586.00    
    MINORITY               0               0               0               0       1.0000000    
    FEMALE                 0               0       1.0000000       1.0000000       1.0000000    
    SES           -2.3280000      -0.4180000       0.1920000       0.7870000       1.8320000    
    MATHACH       -2.8320000       7.7805000      13.6230000      18.7030000      24.9930000    
    SIZE         153.0000000     493.0000000         1068.00         1523.00         2403.00    
    SECTOR                 0               0               0       1.0000000       1.0000000    
    MEANSES       -0.7510000      -0.1010000       0.1985000       0.4480000       0.7590000    
       
                                         The SAS System       13:33 Wednesday, June 16, 2006   3
                                                                                                
                                      The Mixed Procedure                                       
                                                                                                
                                       Model Information                                        
                                                                                                
                     Data Set                     WORK.HS                                       
                     Dependent Variable           MATHACH                                       
                     Covariance Structure         Factor Analytic                               
                     Subject Effect               SCHOOL                                        
                     Estimation Method            REML                                          
                     Residual Variance Method     Profile                                       
                     Fixed Effects SE Method      Model-Based                                   
                     Degrees of Freedom Method    Satterthwaite                                 
                                                                                                
                                                                                                
                                    Class Level Information                                     
                                                                                                
                       Class     Levels    Values                                               
                                                                                                
                       SCHOOL        40    1317 1374 1436 1461 1909 2030                        
                                           2336 2651 2658 2755 2995 3039                        
                                           3152 3332 3377 3657 4511 4642                        
                                           4868 5192 5783 5838 6074 6366                        
                                           6415 6469 6808 6897 7101 7919                        
                                           8202 8367 8775 8854 9021 9104                        
                                           9158 9340 9347 9586                                  
                                                                                                
                                                                                                
                                          Dimensions                                            
                                                                                                
                              Covariance Parameters             2                               
                              Columns in X                      1                               
                              Columns in Z Per Subject          1                               
                              Subjects                         40                               
                              Max Obs Per Subject              61                               
                              Observations Used              1720                               
                              Observations Not Used             0                               
                              Total Observations             1720                               
                                                                                                
                                                                                                
                                       Iteration History                                        
                                                                                                
                  Iteration    Evaluations    -2 Res Log Like       Criterion                   
                                                                                                
                          0              1     11538.61727936                                   
                          1              2     11311.13751265      0.00000323                   
                          2              1     11311.12408759      0.00000000                   
                                                                                                
                                                                                                
                                   Convergence criteria met.                                    
                                         The SAS System       13:33 Wednesday, June 16, 2006   4
                                                                                                
                                      The Mixed Procedure                                       
                                                                                                
                                       Estimated G Matrix                                       
                                                                                                
                             Row    Effect       SCHOOL        Col1                             
                                                                                                
                               1    Intercept    1317        9.1737                             
                                                                                                
                                                                                                
                                 Covariance Parameter Estimates                                 
                                                                                                
               Cov Parm     Subject    Estimate     Alpha       Lower       Upper               
                                                                                                
               FA(1,1)      SCHOOL       3.0288      0.05      2.3924      3.9592               
               Residual                 39.8255      0.05     37.2627     42.6634               
                                                                                                
                                                                                                
                           Asymptotic Covariance Matrix of Estimates                            
                                                                                                
                            Row    Cov Parm        CovP1       CovP2                            
                                                                                                
                              1    FA(1,1)        0.1503    -0.01085                            
                              2    Residual     -0.01085      1.8898                            
                                                                                                
                                                                                                
                                        Fit Statistics                                          
                                                                                                
                             -2 Res Log Likelihood         11311.1                              
                             AIC (smaller is better)       11315.1                              
                             AICC (smaller is better)      11315.1                              
                             BIC (smaller is better)       11318.5                              
                                                                                                
                                                                                                
                                Null Model Likelihood Ratio Test                                
                                                                                                
                                  DF    Chi-Square      Pr > ChiSq                              
                                                                                                
                                   1        227.49          <.0001                              
                                                                                                
                                                                                                
                                  Solution for Random Effects                                   
                                                                                                
                                               Std Err                                          
           Effect       SCHOOL    Estimate        Pred      DF    t Value    Pr > |t|           
                                                                                                
           Intercept    1317        0.2933      0.9875    1719       0.30      0.7665           
           Intercept    1374       -2.7094      1.1926    1719      -2.27      0.0232           
           Intercept    1436        4.7819      1.0173    1719       4.70      <.0001           
           Intercept    1461        3.5215      1.1249    1719       3.13      0.0018           
           Intercept    1909        1.3553      1.1926    1719       1.14      0.2559           
           Intercept    2030       -0.7138      0.9945    1719      -0.72      0.4730           
           Intercept    2336        3.3503      0.9945    1719       3.37      0.0008           
           Intercept    2651       -1.5917      1.0704    1719      -1.49      0.1372           
           Intercept    2658        0.4909      1.0094    1719       0.49      0.6268           
           Intercept    2755        3.3126      0.9945    1719       3.33      0.0009           
           Intercept    2995       -3.0262      1.0019    1719      -3.02      0.0026           
           Intercept    3039        3.4026      1.3215    1719       2.57      0.0101           
           Intercept    3152        0.3241      0.9611    1719       0.34      0.7360           
           Intercept    3332        1.2746      1.0704    1719       1.19      0.2339           
           Intercept    3377       -3.3482      1.0094    1719      -3.32      0.0009           
           Intercept    3657       -3.0750      0.9674    1719      -3.18      0.0015           
           Intercept    4511        0.5127      0.9270    1719       0.55      0.5802           
           Intercept    4642        1.6254      0.9118    1719       1.78      0.0748           
           Intercept    4868       -0.4857      1.1131    1719      -0.44      0.6626           
           Intercept    5192       -2.1197      1.1926    1719      -1.78      0.0757           
           Intercept    5783        0.2544      1.1778    1719       0.22      0.8290           
           Intercept    5838        0.7295      1.1501    1719       0.63      0.5260           
           Intercept    6074        0.8549      0.9377    1719       0.91      0.3621           
           Intercept    6366        2.6036      0.9270    1719       2.81      0.0050           
           Intercept    6415       -0.9235      0.9491    1719      -0.97      0.3307           
           Intercept    6469        5.2016      0.9323    1719       5.58      <.0001           
           Intercept    6808       -3.2510      1.0173    1719      -3.20      0.0014           
           Intercept    6897        2.0574      0.9806    1719       2.10      0.0360           
           Intercept    7101       -0.8727      1.1926    1719      -0.73      0.4644           
           Intercept    7919        1.7829      1.0804    1719       1.65      0.0991           
           Intercept    8202       -1.0191      1.1018    1719      -0.92      0.3551           
           Intercept    8367       -6.3394      1.5231    1719      -4.16      <.0001           
           Intercept    8775       -3.1097      0.9875    1719      -3.15      0.0017           
           Intercept    8854       -7.5886      1.1372    1719      -6.67      <.0001           
           Intercept    9021        1.7065      0.9377    1719       1.82      0.0690           
           Intercept    9104        3.6835      0.9433    1719       3.90      <.0001           
           Intercept    9158       -3.9862      0.9550    1719      -4.17      <.0001           
           Intercept    9340       -1.4607      1.1778    1719      -1.24      0.2151           
           Intercept    9347        0.6327      0.9323    1719       0.68      0.4975           
           Intercept    9586        1.8683      0.9218    1719       2.03      0.0428           

Since this is a Null model, I don't think I can produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

by Jing

3. One-way ANCOVA with random effects

      PROC MIXED;
        CLASS SCHOOL;
        MODEL MATHACH = SES / S DDFM = SATTERTH;
        RANDOM INTERCEPT / SUBJECT = SCHOOL TYPE = FA0(1);
        RUN;

Is the effect of SES significant?

Run the corresponding analysis in R and compare output. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

 R code:
##One-way ANCOVA with random effects 

hs <- read.csv("http://www.math.yorku.ca/~georges/Data/hs.csv")
source("http://www.math.yorku.ca/~georges/R/fun.R")
       td ( new = T )
       xqplot ( hs )
library(nlme)
fit <- lme( mathach ~ ses, data= hs ,random = ~ 1  | school )
summary(fit)

## Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits. 

    hpred <- hs
    hpred$y0 <- predict(fit, level = 0)     # Population
    hpred$y1 <- predict(fit, level = 1)     # BLUPS

    xyplot( mathach ~ ses | school , hpred,skip=c(rep(F,25),T,T),
   	    panel = function(x, y, subscripts, ...) {
		        panel.xyplot(x,y,...)
		        panel.lmline(x,y,...,col='black',lwd=2)
		        y0 <- hpred$y0[subscripts]
		        panel.lmline(x,y0,...,type='l',col='green',lwd=2)
		        y1 <- hpred$y1[subscripts]
		        panel.lmline(x,y1,...,type='l',col='red',lwd=2)
        } 
    )  
 ##Results from R
Linear mixed-effects model fit by REML
 Data: hs 
       AIC      BIC    logLik
  11192.08 11213.88 -5592.041

Random effects:
 Formula: ~1 | school
        (Intercept) Residual
StdDev:    2.198615 6.116201

Fixed effects: mathach ~ ses 
                Value Std.Error   DF t-value p-value
(Intercept) 12.531101 0.3812171 1679 32.8713       0
ses          2.530339 0.2146264 1679 11.7895       0
 Correlation: 
    (Intr)
ses -0.078

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.12326426 -0.73360355  0.03418091  0.76324884  2.70656225 

Number of Observations: 1720
Number of Groups: 40 

The effect of SES is significant because the p_vaule of SES is less than .0001.

                                      The SAS System         00:21 Thursday, June 17, 2006   6

                                       The Mixed Procedure

                                 Covariance Parameter Estimates

                                Cov Parm     Subject    Estimate

                                FA(1,1)      SCHOOL       2.1985
                                Residual                 37.4080


                                         Fit Statistics

                              -2 Res Log Likelihood         11184.1
                              AIC (smaller is better)       11188.1
                              AICC (smaller is better)      11188.1
                              BIC (smaller is better)       11191.5


                                Null Model Likelihood Ratio Test

                                  DF    Chi-Square      Pr > ChiSq

                                   1         91.08          <.0001


                                   Solution for Fixed Effects

                                         Standard
                Effect       Estimate       Error      DF    t Value    Pr > |t|

                Intercept     12.5311      0.3812    34.4      32.87      <.0001
                SES            2.5304      0.2146    1655      11.79      <.0001


                                 Type 3 Tests of Fixed Effects

                                       Num     Den
                         Effect         DF      DF    F Value    Pr > F

                         SES             1    1655     139.00    <.0001

From the above outputs, I think the result from SAS is the same as the result from R.

             OLS within cluster fits, BLUPS and population-level fits:

Image:A3-33.JPG

by Jing


5. Intercepts and slopes as outcomes

      PROC MIXED ASYCOV CL;
        CLASS SCHOOL;
        MODEL MATHACH = SES SECTOR SES*SECTOR / S DDFM = SATTERTH;
        RANDOM INTERCEPT SES /
        /* try various options for TYPE */
        SUBJECT = SCHOOL G GC;
        RUN;

Compare SAS results with the corresponding analysis in R. Use R to produce trellis plots showing OLS within cluster fits, BLUPS and population-level fits.

 ## R code 
fit <- lme( mathach ~ ses+factor(sector)+ses*factor(sector), data= hs ,random = ~ 1 +ses  | school )

##R result:  iteration limit reached without convergence (9)

SAS output:

                                      The SAS System         00:21 Thursday, June 17, 2006  10

                                       The Mixed Procedure

                                        Estimated G Matrix

                        Row    Effect       SCHOOL        Col1        Col2

                          1    Intercept    1317        3.5793
                          2    SES          1317


                                     Estimated Chol(G) Matrix

                        Row    Effect       SCHOOL        Col1        Col2

                          1    Intercept    1317        1.8919
                          2    SES          1317


                                 Covariance Parameter Estimates

               Cov Parm      Subject    Estimate     Alpha       Lower       Upper

               Intercept     SCHOOL       3.5793      0.05      2.1090      7.3715
               SES           SCHOOL            0         .           .           .
               Residual                  37.3310      0.05     34.9244     39.9965


                            Asymptotic Covariance Matrix of Estimates

                      Row    Cov Parm         CovP1       CovP2       CovP3

                        1    Intercept       1.2446                -0.07293
                        2    SES
                        3    Residual      -0.07293                  1.6668


                                         Fit Statistics

                              -2 Res Log Likelihood         11167.7
                              AIC (smaller is better)       11171.7
                              AICC (smaller is better)      11171.7
                              BIC (smaller is better)       11175.1


                                   Solution for Fixed Effects

                                          Standard
                Effect        Estimate       Error      DF    t Value    Pr > |t|


by Jing



7. Using the general linear hypothesis (SAS ESTIMATE statement) to test plausible questions:

Consider the following mixed model:
       PROC MIXED ASYCOV ASYCORR CL;
         CLASS SCHOOL;
         MODEL MATHACH = SES SECTOR FEMALE SES*SECTOR /
                 S DDFM = SATTERTH;
         RANDOM SES FEMALE INT / SUBJECT = SCHOOL
                 TYPE = FA0(3) G GC;
         ESTIMATE ‘Cath – Pub | low SES’ SECTOR 1 SES*SECTOR -2 / CL E;
         ESTIMATE ‘Cath – Pub | high SES’ SECTOR 1 SES*SECTOR +2 / CL E;
         RUN;

How are the above ESTIMATE statements obtained and what does each test? Compare the ESTIMATE statement in SAS with the output of 'glh' in R.

Section 5: all

R code:

sub<-c(1,1,1,2,2,2,3,3,3,4,4,4)
dose<-c(1,2,3,3,4,5,5,6,7,7,8,9)
ys<-c(3.1,2.2,1.1,5.0,4.1,3.0,7.0,6.1,5.0,9.1,8.2,7.1)
yb<-c(3.1,3.9,1.1,5.0,4.9,3.0,7.0,6.9,5.0,9.1,9.9,7.1)
expt<-data.frame(sub, dose,ys, yb)

   if(F) { # assuming the following has been done:
         dir.create("/R")
         download.file( "http://www.math.yorku.ca/~georges/R/fun.R", "/R/fun.R")
   }
   
   source("/R/fun.R") 
/*Plot data*/
library(lattice)
library(car)
library(MASS)
library(Martix)
xyplot(ys ~ dose | sub, data=expt, type = "l")
xyplot(yb ~ dose | sub, data=expt, type = "l")
scatterplot(ys ~ dose | sub, data=expt)
scatterplot(yb ~ dose | sub, data=expt)

/*within subject model small sigma*/
expt.lm <- lm(ys ~ dose+factor(sub), data=expt)
summary(expt.lm)
summary(expt.lm)$sigma
/*within subject model big sigma*/
expt_yb.lm <- lm(yb ~ dose+factor(sub), data=expt)
summary(expt_yb.lm)
summary(expt_yb.lm)$sigma

/*compute subject averages and deviations for ys and dose*/
library(nlme)
expt.avg<-gsummary(expt, form= ~ sub)
expt.avg
ys_b <- tapply( expt$ys, expt$sub, mean, na.rm=T ) [
	    tapply(expt$ys, expt$sub) ]
yb_b <- tapply( expt$yb, expt$sub, mean, na.rm=T ) [
	    tapply(expt$yb, expt$sub) ]
dose_b <- tapply( expt$dose, expt$sub, mean, na.rm=T ) [
	    tapply(expt$dose, expt$sub) ]
ys_w <-  ys - ys_b
yb_w <-  yb - yb_b
dose_w <-  dose - dose_b
expt<-data.frame(sub, dose,ys, yb,ys_b,yb_b,dose_b,ys_w,yb_w,dose_w)
expt

/*between subjec model small sigma*/

expt_ysb.lm <- lm(ys_b ~ dose, data=expt)
summary(expt_ysb.lm)
summary(expt_ysb.lm)$sigma

/*between subject model big sigma*/
expt_ybb.lm <- lm(yb_b ~ dose, data=expt)
summary(expt_ybb.lm)
summary(expt_ybb.lm)$sigma

/*pooled model small sigma*/
expt_ysp.lm <- lm(ys ~ dose, data=expt)
summary(expt_ysp.lm)
summary(expt_ysp.lm)$sigma

/*pooled model big sigma*/
expt_ybp.lm <- lm(yb ~ dose, data=expt)
summary(expt_ybp.lm)
summary(expt_ybp.lm)$sigma

/*proc mixed estimate small sigma*/
fit <- lme( ys ~ dose, data= expt ,random = ~ 1  | sub )
summary(fit)
summary(fit)$sigma

/*proc mixed estimate big sigma*/
fit_yb <- lme( yb ~ dose, data= expt ,random = ~ 1  | sub )
summary(fit_yb)
summary(fit_yb)$sigma

/*try controlling for the patient average dose:*/

fit_ys_average <- lme( yb ~ dose_w*dose_b, data= expt ,random = ~ 1  | sub )
summary(fit_ys_average)
summary(fit_ys_average)$sigma

/*Try a radnom effect of dose_w: */
fit_ys_random <- lme( yb ~ 0 + dose_w * dose_b, data= expt ,random = ~ 1  + dose_w| sub )
summary(fit_ys_random)
summary(fit_ys_random)$sigma
/*Try a radnom effect of dose: */
fit_ys_dose_random <- lme( yb ~ 0 + dose_w * dose_b, data= expt ,random = ~ 1  + dose| sub )
summary(fit_ys_dose_random)
summary(fit_ys_dose_random)$sigma

/*Plot OLS fits for each patient, BLUPs and population-level fits*/
 fit <- lme( ys ~ dose, data= expt ,random = ~ 1 +dose | sub )
    hpred <- expt
    hpred$y0 <- predict(fit, level = 0)     # Population
    hpred$y1 <- predict(fit, level = 1)     # BLUPS

    xyplot( ys ~ dose | sub , hpred,skip=c(rep(F,25),T,T),
   	    panel = function(x, y, subscripts, ...) {
		        panel.xyplot(x,y,...)
		        panel.lmline(x,y,...,col='black',lwd=2)
		        y0 <- hpred$y0[subscripts]
		        panel.lmline(x,y0,...,type='l',col='green',lwd=2)
		        y1 <- hpred$y1[subscripts]
		        panel.lmline(x,y1,...,type='l',col='red',lwd=2)
        } 
    )  

Results: The estimated parameters are as following:


"Effect" of Dose Small sigma Big sigma
Marginal (Pooled) 1.682021 1.847033
Between 0.8419515 0.868264
Within (Ajusted for patient) 0.06172134 0.87831
Mixed TYPE=UN
Mixed TYPE=FA0(1) 0.06172449 0.889105

Graphics:

     DATA PLOT

Image:A3_51.jpg Image:A3-52.jpg

    OLS fits for each patient, BLUPs and popultario-level fits PLots

Image:A3-53.jpg

by Jing

Section 6: b

/*The following R code is from our course R code examples*/

if(F) { # assuming the following has been done:
         dir.create("/R")
         download.file( "http://www.math.yorku.ca/~georges/R/fun.R", "/R/fun.R")
   }
   
 source("/R/fun.R") 
  if(F) { # assuming the following has been done:
         dir.create("dataIN")
         download.file( "http://www.math.yorku.ca/~georges/Data/hs.csv", 
                        "dataIN/hs.csv")       
   }

   hs <- read.csv("dataIN/hs.csv")
  
   library(nlme)
   library(car)    # you might need to install this
                    
   library(MASS) # This is the library from Modern Applied
       
 hs$sector
   table(hs$sector)

   hs$Sector <- factor( ifelse( hs$sector == 0, "Public", "Catholic") )
   table( hs$Sector )

### School
   
   hs$school
   atotal( z <- table(hs$school) )   #number of students in each school
   cbind ( atotal( table(hs$school) )  )
   plot( table(hs$school) )
   
   
   
   # first define a useful function:
   ch <- function(x) as.character(x)

   # more informative school id's

   hs$School <- factor( paste ( substring( ch( hs$Sector), 1,1),
                                    hs$school, sep = ""))

   atotal( table ( hs$School ))

   atotal( table( hs$School, hs$Sector) )
   
   atotal( table( hs$School, hs$Sector) > 1)

### Sex

   hs$Sex <- factor( ifelse( hs$female == 1, "Female", "Male"))
   with( hs, atotal ( table( Sector, Sex))) 
   fit <- xtabs( ~ Sector + Sex, hs)
   fit 
   summary(fit)

#
# Create "aggregate" data frame "by school"
# -- This is a "Level 2" data frame with one row per school,
#    we can call it the "short" data frame, and the original the "long"
# -- Note we can also think of a "wide" data frame as that with the
#    full data stored as different variable with one row per school.


   hs.a <- gsummary( hs, form = ~ School)   # from library(nlme)
   dim( hs.a )
   hs.a 



#
# Note 1: gsummary takes the mean of numerical variables and the mode
# of factor variables.  This works fine for outer numerical, outer 
# factor and inner numerical variables. It doesn't generally 
# produce useful results for inner factors.
#
# Note 2: gsummary is in 'nlme'
#
# We can add custom-made variables with tapply  

   hs.a$N      <- tapply( hs$ses,  hs$School, length )      # number of observations per school
   hs.a$ses.sd <- tapply( hs$ses,  hs$School, function(x) sqrt(var(x)))  # sd of ses
   hs.a

  
### 
###  Creating 'contextual' variables
###


   # Note: we can 'lift' a variable from the short data frame to the long data frame
   # with tapply without a function argument.

   hs$N <- hs.a$N[ tapply( hs$School, hs$School) ]
   hs$ses.sd <- hs.a$ses.sd[ tapply( hs$School, hs$School) ]

# Note: 
# - The first argument is not important as long as it has the correct length
# - We can also define a Level 2 variable in "one shot":

   hs$ses.mean <- tapply( hs$ses, hs$School, mean, na.rm=T ) [
	    tapply(hs$ses, hs$School) ]
	    

# Caution: Don't reorder the "by" factor between uses of tapply. 
# The order won't match

# capply in funRDC.R does this more conveniently ('c' stands for 'contextual')
   
   capply
   
   hs$minority.prop <- capply( hs$minority, hs$School, mean, na.rm = T)


###
###  Saving a data set  ( You may need the create the directory 'data' in the
###  working directory. To find out where you are, use > getwd()
###
   
   dir.create("data")  # only necessary once
   
   save( hs, file = "data/hs.rda" )  # in future session you can load the data.frame with
   
   # Note that you also just save the environment at the end of your session but
   # this can get messy and unreliable
   
   data( hs )  # this loads the data saved in 'data/hs.rda'

### Fitting Mixed models with contextual effects using R
fit <- lme( mathach ~ ses+ses.mean+minority+minority.prop, data= hs ,random = ~ 1 +ses| School )
summary(fit)

Results:

Linear mixed-effects model fit by REML
 Data: hs 
       AIC      BIC    logLik
  11089.92 11138.94 -5535.959

Random effects:
 Formula: ~1 + ses | School
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 1.5098857 (Intr)
ses         0.4702158 -0.458
Residual    5.9635896       

Fixed effects: mathach ~ ses + ses.mean + minority + minority.prop 
                  Value Std.Error   DF   t-value p-value
(Intercept)   12.417579 0.4081393 1678 30.424853  0.0000
ses            2.038890 0.2305240 1678  8.844588  0.0000
ses.mean       4.294775 0.7490919   37  5.733308  0.0000
minority      -4.146355 0.4491158 1678 -9.232263  0.0000
minority.prop  2.006420 1.4780814   37  1.357449  0.1829
 Correlation: 
              (Intr) ses    ses.mn minrty
ses           -0.096                     
ses.mean      -0.362 -0.262              
minority      -0.008  0.138 -0.041       
minority.prop -0.638 -0.039  0.168 -0.291

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.26616600 -0.70279789  0.01105957  0.75313796  2.81928889 

Number of Observations: 1720
Number of Groups: 40 

From the above result, since the estimated value of ses.mean is 4.294775,I think a low SES child appear to do better in a high SES. This depends on minority status because the p-value of minority is 0.0000. A high SES child can do better in a high SES than a low SES shool. Since the estimated value of minority is -4.146355,I don't think minority students appear to do better with a high proportion of minority students. The limitations in the interpretation of these results may be the data is not enough.

by Jing