Bion Matrix Problems

From MathWiki

NOTE: Our questions have been done using different data sets because of different team members posting different answers.

Table of contents

Question 1

Generate a set of random number to simulate the LSAT scores and GPAs of 100 new law school applicants assuming they come from a population in which the LSAT scores are jointly normally distributed with a mean of 625 and a standard deviation of 60 and GPAs are normal with a mean 3.2 and standard deviation .3. Further suppose that the correlation between LSAT scores and GPAs is 0.65. Plot the data and plot the standard ellipses for the data and for the population from which they were generated.

Open library
library(MASS)
library(car)
Define variables and generate data 
LSATmu <- 625
GPAmu <- 3.2
LSATsigma <- 60
GPAsigma <- .3
Find the covariance of GPA and LSAT σLsatGpa
Figure: Scatterplot of GPA against LSAT; red: Population ellipse, blue: Sample ellipse
Enlarge
Figure: Scatterplot of GPA against LSAT; red: Population ellipse, blue: Sample ellipse


We know that the correlation between GPA and LSAT is 0.65 and we use this to find the covariance of GPA and LSAT in the following way:

\rho_{LsatGpa} = \frac{\sigma_{LsatGpa}}{\sqrt{\sigma^2_{Lsat}\sigma^2_{Gpa}}} = \frac{\sigma_   {LsatGpa}}{\sigma_{Lsat} \sigma_{Gpa}}

Here it is given that the correlation or ρLsatGpa = 0.65

therefore, we can find the covariance of GPA and LSAT: σLsatGpa = ρLsatGpaσLsatσGpa


We compute σLsatGpa in R:

Find the covariance of GPA and LSAT          
cv_GPA_LSAT <- 0.65*LSATsigma*GPAsigma
Standard normal
Z <- matrix( rnorm( 2 * 100), ncol = 2)
Z     # Note: each row is an observation
plot( Z )
Transformation of Normal   
mu <- c(LSATmu,GPAmu)
V <- matrix( c( LSATsigma^2, cv_GPA_LSAT, cv_GPA_LSAT, GPAsigma^2), ncol = 2)
 Need to find a matrix A such that V = AA' 
A <- t( chol( V ) )     lower triangular matrix A
A %*% t(A)
X <- t ( mu + A %*% t(Z))

Plots

plot(X, xlab='LSAT', ylab='GPA')
circ <- cbind( cos( seq(0,1,.01) * 2 * pi) , sin ( seq( 0, 1, .01) * 2 * pi))
#eqscplot( X )
# To get the population ellipse, do onto the unit circle as you do onto standard normal
lines( t( mu + A %*% t(circ)), col = 'red')  # population ellipse

To get the sample ellipse

use the sample mean and the sample variance 
mu.hat <- apply( X, 2 , mean)
mu.hat
lines ( t ( mu.hat + t( chol(var(X))) %*% t(circ)), col = 'blue')


Team Bion

Alternate Answer for Question 1


> library(MASS)

> library (car)

> Z<-matrix(rnorm(2*100),ncol=2)

> Z

             [,1]          [,2]
 [1,]  0.34935660 -0.3156092570
 [2,] -0.75975042  1.1968383305
 [3,] -1.73440795  2.1304490723
 [4,]  0.04215660 -1.4295633603
 [5,] -0.14932069  0.4908789485
 [6,] -0.65208958  0.5325056408
 [7,]  1.14766802  0.9935534388
 [8,] -0.14152330 -1.0433023776
 [9,]  1.19992879 -0.5791668105
[10,]  0.25062486  0.7985512551
[11,]  0.93648325 -1.9530910813
[12,] -0.19957549 -0.2914818991
[13,]  0.08250134 -0.8985370678
[14,]  2.00874617 -0.7290139047
[15,]  1.05283198  0.5354049704
[16,]  0.69576155 -0.1814026347
[17,] -0.23198963 -0.2150186287
[18,]  0.22531631 -0.7862881549
[19,] -0.05941959  0.6133575914
[20,]  0.04592932  0.9381921993
[21,] -1.82195956 -0.3711128421
[22,]  0.63153005  0.3981913891
[23,] -0.40134020  0.9788626075
[24,] -1.19453844 -1.1494324394
[25,] -0.55087261 -0.8226379394
[26,] -0.34186200  0.7962234999
[27,] -1.24336886 -0.3984145875
[28,] -1.48648300 -0.6364765611
[29,]  0.95452713  0.7584190006
[30,] -1.35298801  0.3425165244
[31,] -0.61336167  2.1088451067
[32,]  1.53184396 -0.4763397300
[33,] -0.66761351 -0.2820257529
[34,] -0.63747602  0.1396404233
[35,]  1.69773380  1.0744277087
[36,] -0.01423540 -0.3671010746
[37,] -0.22762359  0.5120084265
[38,]  0.92868123  0.2562797094
[39,] -2.03589410 -0.8974210347
[40,]  0.50314410  0.0567792634
[41,] -0.92578387  1.2604636072
[42,]  0.42242654  2.2589521482
[43,]  0.37853311  0.0598879156
[44,]  1.59461409 -0.8132885083
[45,] -0.60274625 -0.6666959776
[46,]  1.60266812 -0.3474767145
[47,] -0.48149853 -0.1088483501
[48,] -0.27435778 -0.9183952568
[49,] -0.36363313 -0.6576672151
[50,]  0.84900518  0.1934018154
[51,] -0.17949080 -2.2577925261
[52,]  0.22612023 -2.0663472016
[53,] -0.54434079  1.6169179962
[54,]  0.06017148 -0.0257513412
[55,] -1.21575754 -0.1335720336
[56,]  0.22704452 -0.0725129554
[57,] -1.06983280 -0.3771825704
[58,]  0.65433804  0.3717837980
[59,] -0.16626539  0.0004266102
[60,]  1.06090410 -1.2770471052
[61,]  0.68512735 -0.6514095315
[62,] -0.43361452 -2.2635766182
[63,] -0.68368580  0.7531605486
[64,] -0.76150331  0.4012294511
[65,] -0.47944714 -0.2971486528
[66,] -2.35676479 -0.0946789213
[67,]  0.48480974  1.6769375330
[68,] -0.00858242  0.2072741650
[69,]  0.90771626 -0.5008369474
[70,]  0.51992453  0.8115573926
[71,] -0.96834625  0.5573427256
[72,] -0.10383855 -0.3020972165
[73,]  0.29022533  0.9535809908
[74,] -0.32074193  0.6731752012
[75,]  1.69533262  0.2394573940
[76,] -1.57493719 -0.6894749693
[77,]  0.70234237 -0.7464791730
[78,] -0.40254536  1.1307404090
[79,]  0.05699873  0.5306313842
[80,] -0.12053874  1.8648851815
[81,] -1.33375058 -0.5012249113
[82,]  0.78994954  1.7551988619
[83,] -0.06717189  1.3373324720
[84,] -1.45202782  0.1453121245
[85,]  0.26170332  1.0095412850
[86,]  0.25659571  1.3823431633
[87,] -1.16391049  1.7060234333
[88,] -1.03844910  0.6425524157
[89,]  0.44152758  0.0228439524
[90,]  1.20118937  0.2176744183
[91,]  0.08484550  0.0082864340
[92,] -0.42930957  0.2524214959
[93,] -0.80247219 -0.1472658252
[94,] -0.85185511  0.1148817319
[95,]  0.92810135  0.2560117606
[96,] -0.24635075 -1.5435112234
[97,] -1.49970289 -0.7998365753
[98,] -0.66422719  1.3113030653
[99,]  0.61930528 -0.9461142207
[100,] -0.78011123  0.2915876072


> plot(Z)

> mu<- c(LSATmu,GPAmu)

> mu

[1] 625.0   3.2

> LSATsd<-60

> GPAsd<-.3

> cov<-0.65*LSATsd*GPAsd

> cov

[1] 11.7

> mu<- c(LSATmu,GPAmu)

> mu

[1] 625.0   3.2

> V<- matrix(c(LSATsd^2, cov, cov, GPAsd^2), ncol=2)

> V

     [,1]  [,2]
[1,] 3600.0 11.70
[2,]   11.7  0.09

> A<-t(chol(V))

> A

             [,1]      [,2]
      [1,] 60.000 0.0000000
      [2,]  0.195 0.2279803

> A%*%t(A)

                  [,1]  [,2]
      [1,] 3600.0 11.70
      [2,]   11.7  0.09

> X<-t(mu+A%*%t(Z))

> X

          [,1]     [,2]
 [1,] 645.9614 3.196172
 [2,] 579.4150 3.324704
 [3,] 520.9355 3.347491
 [4,] 627.5294 2.882308
 [5,] 616.0408 3.282793
 [6,] 585.8746 3.194243
 [7,] 693.8601 3.650306
 [8,] 616.5086 2.934551
 [9,] 696.9957 3.301948
[10,] 640.0375 3.430926
[11,] 681.1890 2.937348
[12,] 613.0255 3.094631
[13,] 629.9501 3.011239
[14,] 745.5248 3.425505
[15,] 688.1699 3.527364
[16,] 666.7457 3.294317
[17,] 611.0806 3.105742
[18,] 638.5190 3.064679
[19,] 621.4348 3.328247
[20,] 627.7558 3.422846
[21,] 515.6824 2.760111
[22,] 662.8918 3.413928
[23,] 600.9196 3.344900
[24,] 553.3277 2.705017
[25,] 591.9476 2.905035
[26,] 604.4883 3.314860
[27,] 550.3979 2.866712
[28,] 535.8110 2.765032
[29,] 682.2716 3.559037
[30,] 543.8207 3.014254
[31,] 588.1983 3.561170
[32,] 716.9106 3.390114
[33,] 584.9432 3.005519
[34,] 586.7514 3.107527
[35,] 726.8640 3.776006
[36,] 624.1459 3.113532
[37,] 611.3426 3.272341
[38,] 680.7209 3.439520
[39,] 502.8464 2.598406
[40,] 655.1886 3.311058
[41,] 569.4530 3.306833
[42,] 650.3456 3.797370
[43,] 647.7120 3.287467
[44,] 720.6768 3.325536
[45,] 588.8352 2.930471
[46,] 721.1601 3.433302
[47,] 596.1101 3.081293
[48,] 608.5385 2.937124
[49,] 603.1820 2.979156
[50,] 675.9403 3.409648
[51,] 614.2306 2.650267
[52,] 638.5672 2.773007
[53,] 592.3396 3.462479
[54,] 628.6103 3.205863
[55,] 552.0545 2.932475
[56,] 638.6227 3.227742
[57,] 560.8100 2.905392
[58,] 664.2603 3.412355
[59,] 615.0241 3.167676
[60,] 688.6542 3.115735
[61,] 666.1076 3.185091
[62,] 598.9831 2.599394
[63,] 583.9789 3.238387
[64,] 579.3098 3.142979
[65,] 596.2332 3.038764
[66,] 483.5941 2.718846
[67,] 654.0886 3.676847
[68,] 624.4851 3.245581
[69,] 679.4630 3.262824
[70,] 656.1955 3.486404
[71,] 566.8992 3.138236
[72,] 618.7697 3.110879
[73,] 642.4135 3.473992
[74,] 605.7555 3.290926
[75,] 726.7200 3.585181
[76,] 530.5038 2.735701
[77,] 667.1405 3.166774
[78,] 600.8473 3.379290
[79,] 628.4199 3.332088
[80,] 617.7677 3.601652
[81,] 544.9750 2.825649
[82,] 672.3970 3.754191
[83,] 620.9697 3.491787
[84,] 537.8783 2.949983
[85,] 640.7022 3.481188
[86,] 640.3957 3.565183
[87,] 555.1654 3.361977
[88,] 562.6931 3.143992
[89,] 651.4917 3.291306
[90,] 697.0714 3.483857
[91,] 630.0907 3.218434
[92,] 599.2414 3.173832
[93,] 576.8517 3.009944
[94,] 573.8887 3.060079
[95,] 680.6861 3.439345
[96,] 610.2190 2.800072
[97,] 535.0178 2.725211
[98,] 585.1464 3.369427
[99,] 662.1583 3.105069
[100,] 578.1933 3.114355

> plot(X, xlab='LSAT', ylab='GPA')

> circ<-cbind(cos(seq(0,1,.01)*2*pi),sin(seq(0,1,.01)*2*pi))

> lines(t(mu+A%*%t(circ)),col='red')

> mu.hat<-apply(X,2,mean)

> mu.hat

[1] 619.240600   3.200973

> lines(t(mu.hat+t(chol(var(X)))%*%t(circ)),col='blue')

Question 2

Use the data in the previous question and generate an additional variable, FYEAR, representing the law school grade at the end of the first year. Generate FYEAR so that is is equal to 0.5 + 0.006 x LSAT + 0.8 x GPA + e, where e is normal with mean 0 and standard deviation equal to 1.

e <- matrix(rnorm(100))
FYEAR <- 0.5 + 0.006*X[1:100] + 0.8*X[101:200] + e

> FYEAR

          [,1]
 [1,] 7.727774
 [2,] 6.039211
 [3,] 7.150404
 [4,] 7.379479
 [5,] 8.039377
 [6,] 7.357757
 [7,] 7.036672
 [8,] 6.941223
 [9,] 6.234592
[10,] 6.474241
[11,] 6.370596
[12,] 6.046036
[13,] 8.158288
[14,] 7.877964
[15,] 7.970057
[16,] 8.276436
[17,] 6.076318
[18,] 9.055112
[19,] 7.246131
[20,] 7.234020
[21,] 3.365101
[22,] 7.465667
[23,] 6.294320
[24,] 6.495836
[25,] 6.605842
[26,] 7.409746
[27,] 5.928211
[28,] 6.831105
[29,] 7.884018
[30,] 6.436381
[31,] 5.733149
[32,] 7.205636
[33,] 6.208008
[34,] 5.266571
[35,] 8.180885
[36,] 7.257076
[37,] 7.725890
[38,] 6.203037
[39,] 5.257378
[40,] 7.115313
[41,] 6.571569
[42,] 9.175829
[43,] 8.340407
[44,] 6.960631
[45,] 6.195779
[46,] 7.235731
[47,] 6.716497
[48,] 5.936205
[49,] 7.757981
[50,] 6.338099
[51,] 6.356820
[52,] 6.156209
[53,] 7.694775
[54,] 6.759865
[55,] 5.899646
[56,] 6.693532
[57,] 6.277205
[58,] 6.827875
[59,] 5.622915
[60,] 5.078693
[61,] 5.615790
[62,] 4.971827
[63,] 6.189602
[64,] 6.963284
[65,] 8.740590
[66,] 7.621541
[67,] 7.571532
[68,] 6.458716
[69,] 7.618320
[70,] 6.961536
[71,] 6.453244
[72,] 5.711423
[73,] 8.004069
[74,] 6.270645
[75,] 6.838580
[76,] 6.515963
[77,] 7.569872
[78,] 6.259013
[79,] 7.044438
[80,] 7.426439
[81,] 5.404371
[82,] 6.471851
[83,] 7.605649
[84,] 5.011069
[85,] 6.980477
[86,] 5.665185
[87,] 7.689709
[88,] 5.954907
[89,] 8.147716
[90,] 8.368440
[91,] 7.754423
[92,] 8.251370
[93,] 4.780489
[94,] 6.918141
[95,] 6.554069
[96,] 6.803886
[97,] 4.930911
[98,] 7.024204
[99,] 7.343491
[100,] 6.535745

Question 3

State the true values of β0, βLSAT, βGPA and σ in this model.

β0 = 0.5

βLSAT = 0.006

βGPA = 0.8

σ = 1


Question 4

(Note: Longena & Nicole: this is his answer to our query about this question:
Q: We don't understand how there is a single numeric answer as the marginal
varience of Y in question #4 (of 1-9) when we are getting a 3x3 matrix

A: Y is a linear combination of Xs and error which could be expressed
as c'Z where c is vector of length 3 and Z is random vector of length 3.
Now Var(Z) is a 3x3 matrix but Var(Y) = c'Var(Z)c is just 1x1.


What are the values of 'true' standardized βs in this model? [Note: This is not trivial because you need to find the marginal variance of Y]

First we compute the marginal variance of Y (which is also known as FYEAR).

Y = β0 + β1LSAT + β2GPA + σ

var(Y) = var(\beta_1 LSAT + \beta_2 GPA + \sigma) = \begin{bmatrix} \beta_1 & \beta_2 & 1\end{bmatrix} \times var(LSAT + GPA + \sigma) \times \begin{bmatrix} \beta_1 \\ \beta_2 \\ 1\end{bmatrix} = \begin{bmatrix}0.006 & 0.8 & 1\end{bmatrix} \times \begin{bmatrix}3600 & 11.7 & 0 \\ 11.7 & 0.09 & 0 \\ 0 & 0 & 1\end{bmatrix} \times \begin{bmatrix}0.006 \\ 0.8 \\ 1\end{bmatrix} = 1.29952

Therefore, std(Y) = 1.139965

#in question 1 we solved the covariance matrix (V)

#make var(Y) matrix
varLGE=cbind(V,0)
varLGE=rbind(varLGE,0)
varLGE[3,3]=1
# make Beta matrix
Betas=matrix(c(0.006,0.8,1),nrow=3)
#perform computation
varY=t(Betas)%*%varLGE%*%Betas
stdY=sqrt(varY)
stdY

Now we can compute the true standardized betas of the model.

\beta_{LsatStandard} = \beta_{Lsat}(\frac{\sigma_{Lsat}}{\sigma_{Fyear}}) = 0.006 \times (\frac{60}{1.13995})= 0.3157992

0.006*60/stdY


\beta_{GpaStandard} = \beta_{Gpa}(\frac{\sigma_{Gpa}}{\sigma_{Fyear}}) = 0.8 \times (\frac{0.3}{1.13995})= 0.2105328

0.8*0.3/stdY

Question 5

Fit a regression of FYEAR on LSAT and GPA producing suitable summary tables and plots.

Figure: Model of FYEAR against GPA and LSAT;
Enlarge
Figure: Model of FYEAR against GPA and LSAT;


colnames(X) = c("LSAT", "GPA")
LSAT=X[1:100];
GPA=X[101:200];
fit<-lm(FYEAR~LSAT+GPA)
summary(fit)
Call:
lm(formula = FYEAR ~ LSAT + GPA)
Residuals:
    Min       1Q   Median       3Q      Max 
-2.60572 -0.71476  0.08329  0.59773  2.04167 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.710096   1.060998   0.669    0.505    
LSAT        0.009211   0.002239   4.113 8.17e-05 ***
GPA         0.140429   0.449528   0.312    0.755    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
Residual standard error: 0.9968 on 97 degrees of freedom
Multiple R-Squared: 0.2932,     Adjusted R-squared: 0.2786 
F-statistic: 20.12 on 2 and 97 DF,  p-value: 4.903e-08 

#before using scatter3d we need to load course functions
download.file("http://www.math.yorku.ca/~georges/R/coursefun.R", "coursefun.R")
source("coursefun.R")
library(rgl)
scatter3d(LSAT, FYEAR, GPA)

Question 6

Test each of the following hypotheses using a GLH approach or otherwise. In each case state whether you have committed a type I error or a type II error. [Why is it impossible to do this in your usual analyses?]

  1. βGPA = 0 and βLSAT = 0
  2. βGPA = 0.8 and βLSAT = 0.006
  3. βGPA = 1
  4. βGPA = 0.81
  5. βGPA = 0.8
  6. \beta_{LSAT} \times 60 = \beta_{GPA} \times 0.3

The model with population parameters is FYEAR = 0.5 + 0.006 x LSAT + 0.8 x GPA + e.

(1) βGPA = 0 and βLSAT = 0

We expect that the regression coefficient would not be equal to zero as the population parameters are 0.006 for LSAT and 0.8 for GPA. This can be seen below as we reject the hypothesis that they are zero. Thus no type I or type II errors.

#set up L and c matrices
L1<-matrix(c(0,0,1,0,0,1),ncol=3)
c1<-rbind(0,0)
lht(fit,L1,c1)

Linear hypothesis test 

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1     97  96.376                                  
2     99 136.358 -2   -39.982 20.120 4.903e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 


(2) βGPA = 0.8 and βLSAT = 0.006

For this hypothesis we would expect to "Fail to reject" the null hypothesis, since the population parameters for LSAT and GPA are 0.006 and 0.8, respectively, which are the values specified by the null hypothesis. The F ratio below confirms our expectation and thus there is no Type I or Type II error.

#set up L and c matrices
L2<-matrix(c(0,0,1,0,0,1),ncol=3)
c2<-rbind(0.006,0.8)
lht(fit,L2,c2)

Linear hypothesis test

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 96.376                           
2     99 98.798 -2    -2.421 1.2185 0.3002

(3) βGPA = 1

Here we would expect to reject the null as the population parameter of GPA is 0.8 and as seen by the R analysis below we fail to reject the null. Thus, this is a Type II error as the alternative hypothesis is the true distribution. This is not a type I error. However the parameter of 1 as specified by the null is relatively close to the true population parameter of 0.8. So, we could have potentially rejected the null if perhaps we increased or sample size. The type II error is not surprising.

#set up L and c matrices
L3<-matrix(c(0,0,1),ncol=3)
c3<-rbind(1)
lht(fit,L3,c3)

Linear hypothesis test

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
1     97  96.376                              
2     98 100.009 -1    -3.633 3.6564 0.05881 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(4) βGPA = 0.81

Compared to (3) 0.81 is closer to the true population paramter of GPA. So we would expect to fail to reject the null. This is exactly what is observed in the R analysis below. This is again a Type II error for the same reason as specified in (3).

#set up L and c matrices
L4<-matrix(c(0,0,1),ncol=3)
c4<-rbind(0.81)
lht(fit,L4,c4)
Linear hypothesis test

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 96.376                           
2     98 98.581 -1    -2.204 2.2186 0.1396

(5) βGPA = 0.8

Compared to (3) and (4), the null here is the population parameter. The analysis below fails to reject the null therefore there is no Type I or Type II error.

#set up L and c matrices
L5<-matrix(c(0,0,1),ncol=3)
c5<-rbind(0.8)
lht(fit,L5,c5)

Linear hypothesis test

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 96.376                           
2     98 98.515 -1    -2.139 2.1528 0.1455

(6) \beta_{LSAT} \times 60 = \beta_{GPA} \times 0.3

This question is related to Bion_9.12.

In this question we are multiplying the regression coefficients by their standard deviations. Thus we are standardizing the coefficients (see Fox page 207 and GM question 4 above). Even when standardized these coefficients are different. This is in agreement to the standardized Betas computed in question 4 suggesting there are no Type I or Type II errors.

#set up L and c matrices
L6<-matrix(c(0,60,-0.3),ncol=3)
c6<-rbind(0)
lht(fit,L6,c6)

Linear hypothesis test

Model 1: FYEAR ~ LSAT + GPA
Model 2: restricted model
  Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
1     97  96.376                              
2     98 100.511 -1    -4.135 4.1614 0.04407 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Why is it impossible to do this in your usual analyses? What is the usual analysis? If the usual analysis is a multiple regression using the different type I, II, or III sum of squares to compare what parameters need to be included in the model then the usual analysis allows check whether Betas are equal to each other or different from zero, but not whether Betas from a mulitple regression are equal to a certain value like in (3), (4), and (5) or whether the standardized values might be equal to each other like in (6). Using the GLH approach allows us to test a wide range of hypothesis than the usual approach.

Question 7

Draw a joint 95% confidence ellipses and a 95% 'confidence-interval generating ellipse' for βGPA and βGPA in the problem above.

> fit<-lm(FYEAR~LSAT+GPA)

> summary(fit)

Call:
 
lm(formula = FYEAR ~ LSAT + GPA)
Residuals: 
       Min         1Q         Median        3Q          Max 
     -2.753179 -0.691224 -0.004176  0.789858  2.799660 

Coefficients:

Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.378811   1.222931   0.310   0.7574  
LSAT        0.004623   0.002402   1.925   0.0572 .
GPA         1.090214   0.476665   2.287   0.0244 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 1.046 on 97 degrees of freedom
Multiple R-Squared: 0.2227,	Adjusted R-squared: 0.2067 
F-statistic:  13.9 on 2 and 97 DF,  p-value: 4.935e-06 


Note:  The population equation for Fyear is listed in question 3 without an interaction term.  I don't    think we need fit2 because there shouldn't be an interaction

> fit2<-lm(FYEAR~LSAT*GPA)

> summary(fit2)

Call:
lm(formula = FYEAR ~ LSAT * GPA)

Residuals:
Min       1Q   Median       3Q      Max 
-2.64425 -0.70212 -0.03127  0.72430  2.78678 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -17.949486  11.313800  -1.587   0.1159  
LSAT          0.034064   0.018225   1.869   0.0647 .
GPA           6.617987   3.425326   1.932   0.0563 .
LSAT:GPA     -0.008830   0.005419  -1.629   0.1065  
 ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 1.038 on 96 degrees of freedom
Multiple R-Squared: 0.2436,	Adjusted R-squared:  0.22 
F-statistic: 10.31 on 3 and 96 DF,  p-value: 6.057e-06 

> source("http://www.math.yorku.ca/~georges/R/coursefun.R")

> plot(rbind(cell(fit),c(0,0)),type='n')

> lines(cell(fit),col='blue')

> lines(cell(fit, dfn=1),col='green')

> plot(rbind(cell(fit2),c(0,0)),type='n') abline(h=0,v=0)

> lines(cell(fit2),col='red')

> lines(cell(fit2, dfn=1),col='green')


 Note: is anybody else having problems loading these images?

Image:Confidence_Ellipse_1_Bion.pdf


Image:Confidence_Ellipse_II_Bion_.pdf

Question 8

Describe how each hypothesis tested above relates to the confidence ellipses.

Note:  Tough to comment because I can't see the plot

1. βGPA = 0 and βLSAT = 0

Reject - LSAT = 0 and GPA = 0 is not within the joint 95% confidence ellipse (probability of covering the true value of LSAT and GPA together; blue ellipse of graph 1).


2. βGPA = 0.8 and βLSAT = 0.006

Fail to reject - LSAT = 0.006 and GPA = 0.8 is within the joint 95% confidence ellipse.


3. βGPA = 1

Fail to reject - GPA = 1 is within the 95% confidence interval shadow (probability of covering either true value of LSAT and GPA; green ellipse of graph 1).


4. βGPA = 0.81

Fail to reject - GAP = 0.81 is within the 95% confidence interval shadow.


5. βGPA = 0.8

Failt to reject - GPA = 0.8 is within the 95% confidence interval shadow.


6. \beta_{LSAT} \times 60 = \beta_{GPA} \times 0.3

I think we're supposed to use the ellipse in graph 2, but I don't know exactly what to do...

Note: I don't think this has to do with the second graph.

Question 9

Suppose you were asked which variable is more important in determining FYEAR? Discuss possible approaches to answering this question.

One approach is to compute regressions with only GPA or only FYEAR and determine the r2 for each regression. The one with the higher r2 would be the better predicter of FYEAR.