# MATH 6630 2006-07

Welcome to the wiki site for MATH 6630 in 2006-2007

To suggest changes or report problems with this page, please add a comment to the discussion by clicking on the tab 'discussion' above.

"There are lies, damned lies, and statistics" -- Benjamin Disraeli

"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." -- H. G. Wells

"It is easy to lie with statistics. It is hard to tell the truth without it." -- Andrejs Dunkels

Math 6630 Applied Statistics I

## News

• NEWS ON THE FINAL EXAM The final exam will have two parts:
• Take home: Saturday December 9 to Monday, December 11. The exam will be available on Saturday at noon and will be due on Monday at noon.
• HERE IT IS: Take home final
• Tutorial: Monday afternoon at 2 pm until 5 pm in N638 Ross.
• In class exam: Tuesday, December 12 from 2 pm to 5 pm in N638 Ross.
• Both Monday, October 2, and Monday, October 9, are holidays which will leave us with too few hours for the course. We will need to hold some additional classes on Wednesdays on days on which the Practicum is not being held. Specific days will be discussed in class.

## General Information

The aim of the course is to help students begin their own discovery of the wide and constantly expanding range of statistical methods and techniques and to equip students with the programming skills to use and implement new methods.

The course begins with an quick overview of statistical computing environments with a deeper exploration of R and its programming and graphical capabilities. We then do an intensive review of linear models followed by a number of newer statistical methods and techniques such as bootstrapping, mixed models, non-linear models, classification, visualization methods, fourier analysis, non-parametric regression.

### Instructor

• Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)
• Office hours: TBA

### Text

• Fox, J. (2007) Applied Regression Analysis, Linear Models, and Related Methods, (2nd ed.), Sage. A copy of the current manuscript will be provided at a nominal charge to cover photocopying costs.

### Course Work

[20] There will be a number of assignments, most done on the couse wiki server, by assigned teams or by individuals
[5] Individual contributions:
Errata page for the Fox manuscript
Wiki: statistical topics or topics in R
Annotated references
General wiki contributions
[20] Individual projects presenting an intermediate or advanced statistical method or technique, its implementation in suitable software (preferably R) and a realistic example of its application preferably to real data. Prepare and submit a plan by early November.
[15] One-hour mid-term exam on Wednesday, October 25.
[15] There will be a weekend-long take-home exam followed by a
[25] 3-hour in class final exam.

Note that most of the course work except the exams will be posted on a wiki server and available for future reference.

### Class list and teams

Class photo, names, e-mail addresses and assignment to teams can be found at http://www.math.yorku.ca/~georges/Courses/6630. Note that a userid and password are needed to access this page.

## Week 1

### Topics

• Organization of the course
• Collaborative work and using a wiki
• Introduction to R: R: Getting started
• Review of linear models
• Making tools for linear models in R
• General Linear Hypothesis
• Statistical ellipses:
• Data ellipse
• Confidence ellipses

### Assignment 1:

Part 1: Cumulatively given in class. Individual work due on Wednesday, September 20. You may use your team as a resource but do and post your own work on the wiki. Make sure to prepend your user name to your files. Include

[[Team X]]

## Week 2

• Topics
• Least-squares, Gauss-Markov, BLUE.
• Normal model: MLE, UMVUE.
• Matrices of linear models
• Projections, linear spaces, rank
• Distributions for the normal model
• Using R for matrix manipulations

## Week 3

• Variance matrices:
• Spectral decomposition theorem
• Non-negative definite and positive-definite matrices
• Factorization criterion
• Fitting linear models in R
• Syntax of the model formula
• Methods for linear models: print, summary, coef, vcov, predict, resid, plot, etc.
• General linear hypothesis
• Writing functions in R: Wald test for GLH

### Assignment 2

Same groups as assignment 1. Problems given in class.

## Week 4

• Monday October 3 is a holiday

Assignments:

• Continuation of Assignment 2 given in class.
• Read Chapters 1 to 4 of Fox. Errata (http://wiki.math.yorku.ca/index.php/Fox_2006:_Applied_Regression_Analysis_and_Related_Methods:_Errata)

Topics:

• Multivariate normal: partitioned variance, conditional distributions.
• Basic tools:
• mean conditional mean = marginal mean
• mean conditional variance + variance of conditional mean = marginal variance
• Concentration ellipse: marginal SD, conditional SD and SD of conditional mean, regression lines, regression paradox.
• Confidence ellipses, data ellipses.

## Week 5

Monday, October 10 is a holiday.

• Simple linear regression and the data ellipse for Y and X.
• Multiple linear regression on 2 variables: The relationship between data ellipse for X1 and X2 and the confidence ellipse for β1 and β2.
• Notes on ellipses (http://www.math.yorku.ca/~georges/Slides/Ellipses.pdf)
• Lessons from data and confidence ellipses[1] (http://www.math.yorku.ca/~georges/Slides/HDCoffe-HW.ppt)

## Week 6

• Read Chapters 5 to 8 of Fox. Errata (http://wiki.math.yorku.ca/index.php/Fox_2006:_Applied_Regression_Analysis_and_Related_Methods:_Errata)
• Multiple regression: the relationship between the data ellipse of predictors and the confidence ellipse for slopes.
• The Added-Variable-Plot (Partial Regression Leverage Plots).
• Using R for Chapters 1 to 8.
• Confidence interval for simple regression and for multiple regression.
• $\hat{\beta}_1^S \pm u_n \; t \; \frac{s_E^S}{\sqrt{n}\; s_{X_1}}$
• $\hat{\beta}_1^M \pm u'_n \; t \; \frac{s_E^M}{\sqrt{n}\; s_{X_1 \bullet X_2 \cdots X_k}}$
where $\lim u_n = \lim u'_n = 1$
• Coffee consumption and Heart Disease: a 'synthetic' example. [2] (http://www.math.yorku.ca/~georges/Slides/HDCoffe-HW.ppt)
• Sample mid-term test

## Week 7

• Anova: Type I, II and III SS. Properties and relationships.
• Diagnostics: handout and 3-d visualization.
• Leverage residual plot
• Assigment 3
Fox: p. 82, ex. 4.3(as you do this consider whether there is a problem with equation 4.3 on p. 80, correct it if necessary)
p. 128, ex. 6.6
p. 129, ex. 6.11
p. 153, ex. 7.1
p. 153, ex. 7.4
p. 226, ex. 9.11
p. 227, ex. 9.15[[3] (http://wiki.math.yorku.ca/images/d/d3/Math6630q915.pdf)]
p. 278, ex. 11.3
p. 319, ex. 12.2
p. 320, ex. 12.6
p. 321, ex. 12.10
p. 350, ex. 13.3
p. 351, ex. 13.8
Fox: p. 108, ex. 5.7
p. 128, ex. 6.7
p. 129, ex. 6.12
p. 153, ex. 7.2
p. 226, ex. 9.9
p. 226, ex. 9.13
p. 278, ex. 11.1
p. 319, ex. 12.3
p. 319, ex. 12.8
p. 321, ex. 12.11
p. 350, ex. 13.6
p. 350, ex. 13.9

Team Lotka

Fox: p. 109, ex. 5.8
p. 129, ex. 6.10
p. 129, ex. 6.13
p. 153, ex. 7.3
p. 226, ex. 9.10
p. 226, ex. 9.14 (estimation intervals vs. prediction intervals)
p. 278, ex. 11.7
p. 319, ex. 12.4
p. 319, ex. 12.9
p. 350, ex. 13.1
p. 351, ex. 13.7

## Week 10

• Non-linear models: Exponential growth curves (http://www.math.yorku.ca/~georges/Slides/TalkOnComasAndMigraines.pdf) FIXED!!

## Week 11

• R script for Chapter 17 Media:Fox-Chap17.R
• Appendices to An R and Splus Companion
• Nonlinear regression and Nonlinear Least Squares[4] (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-nonlinear-regression.pdf)
• Nonparametric regression[5] (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-nonparametric-regression.pdf)
• Bootstrapping Regression Models[6] (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf)
• R script on Bootstrapping[7] (http://cran.r-project.org/doc/contrib/Fox-Companion/bootstrapping.txt)

## Week 12

R Scripts:

Chapter 17 Non-linear models
Chapter 18 Nonparametric smoothing and cross-validation
Chapter 21 Bootstrapping
Chapter 22 Cross validation