From MathWiki

Latex for MATH 6635

a1 = λ1γ1 and a2 = λ2γ2 should have been a_1 = \lambda_1^{1/2} \gamma_1 and a_2 = \lambda_2^{1/2} \gamma_2 respectively. With Λ = diag12), this yields Σ = ΓΛ1 / 2Λ1 / 2Γ' = ΓΛΓ'

Current projects

Quick Links

MATH 4939

MATH 2565

NATS 1500

What 2014W 2013W 2012W 2009-10
Wiki NATS 1500 2014W NATS_1500_2013W NATS_1500_2012W Old_Courses:_NATS_1500_2009-10
Files Files ( Files (
York Class list list (
Assn 0 Assignment 0 ( NATS_1500_2012W#Assignment_0

Things to add for NATS 1500 2013-2014W

*1. NATS LibGuide <> and STS
LibGuide <>*
Please be sure to let students know of this resources on course Moodle
pages, in class, and maybe even as a resource in your syllabus. It will
be especially important for first year students. You can access the
guide from the library homepage by going to the Research Guides drop
down menu and selecting Natural Sciences. The websites URL are :

*2. Information Literacy Workshops*
Library information literacy classes can be held either in class or at
the library in Steacie's computer lab. Library instruction is tailored
towards assignments and course subject matter. They are usually 1 hour
classes though I can be flexible with timing. Additionally, library
sessions are meant to assist students with learning how to develop a
research question/topic, search strategies, advanced research,
identifying primary sources, Refworks, citation, and academic integrity.

Please let me know if you have any questions and/or you would like to
book a session.




Departmental Wiki (
Stats Wiki (
SCS Wiki (
SORA Wiki (
Home Page (
Department room bookings (
Department private page (
Chair's page (
Statistics Wiki (
Actuarial Planning
York Finance
Per diem travel allowances:

Georges Monette

Office: N626 Ross
Phone: 416 736 2100 ext 77164
Table of contents

Notes for MATH 1532


Test including a graphics file



Suumer 2009 Projects

  1. Implement Satterthwaite for 'wald'
  2. Implement influence diagnostics for 'lme'
    1. Consider local influence: Verbeke and Molenberghs (2000) pp167ff.
  3. Improve documentation for spida and p3d
  4. Add hccm to wald

Notes on model building and diagnostics with MMs

  1. Explore science, formulate formal hypotheses
  2. Explore data at various levels of aggregation and with various dimensions: 1d, 2d, and higher.
  3. Explore science ... iterate with previous a few times, formulate exploratory hypotheses (one study's exploratory hypothesis is the next study's formal hypothesis)
  4. Building a model
    1. Start with the science: what variables are needed as controls, what variables should be omitted as possible mediators.
    2. With many single level models, e.g. homoscedastic GLMs, the estimation of the mean model and of the variance model are largely orthogonal so one does not depend on the other. In contrast, with mixed models the variance model affects the mean model and vice versa. Thus it's necessary to iterate to some extent.
    3. There are two components: the FE model for beta and the RE model with two parts: the 'G' matrix for random effects between clusters and the 'R' matrix for the variance of error within clusters.
    4. Start with an FE model that is large but not beyond the capacity of the data to produce a valid model. In OLS, for validity, some authors (e.g. Harrell) propose that there should be at least 10 to 15 observations per parameter. Considering the relationship between the number of observations and the number of parameters Level 1 effects have somewhat less than the total number of observation less the number of clusters. The Level 2 effects have somewhat less than the number of clusters. The random effects, of which there can be as many as Level 1 effects plus 1 for the intercept, can be thought of as multivariate data with one observation (not a very good one) per cluster. Estimating the variance of this multivariate data takes p(p+)/2 variance and covariance parameters. Bear these factors in mind as you decide on the complexity of the preliminary FE and RE (just G for now) models.
    5. If the OLS model can be estimated in each cluster use lmList to fit it.
      1. Get the residuals and fitted coefficients ( coef( fit.list) ). Plot the residuals to find possible outliers, lack of fit (e.g.curvilinearity) and heteroskedasticity (plot the square root of the absolute residual against the fitted value and various predictors to look for changes in variance and outliers. If there is evidence of heteroskedasticity, you may consider variance-stabilizing transformations of the response (if they don't introduce non-linearity) or fitting a mixed model that incorporates heteroskedasticity with the 'weights = var...' argument to the fitting function.
      2. Plot the fitted coefficients to look for outliers, singularity of their distribution, relationships with Level 2 variables.
    6. Formulate and fit the initial model. If the initial model does not converge because it reached the iteration limit, increase the number of iterations and use verbose mode.
      control = list( msMaxIter = 200, msVerbose = TRUE, msMaxEval = 500, returnObject = TRUE)
      If the model now fails to convergee with singular non-convergence:
      Rule out near singularity of the FE model.
      Then look at the structure of the random effects. Their variability is likely to be rank deficient.
      Identify the nature of the deficiency to simplify the G model.
      Try to refit the RE model with the same variables but centered within group (using 'dvar' in spida).
      If this does not work, try to refit with a smaller G model guided by the variability of BLUPS shown by the variance of ranef(fit). If ranef(.) has two or three columns, visualize it with an appropriate plot: e.g. for 3 columns:
      > library(p3d)
      > Plot3d( ranef(fit) )
      where rotating the point cloud is likely to reveal that it is almost in a 2- or 1-dimensional subspace.
    7. Iterate between the FE model and the G model performing influence diagnostics.
    8. Test whether a more complex R model is needed. There are two main possibilities:
      1. Heteroskedasticity revealed by plotting Level 1 residuals ( or square roots of absolute residuals ) against fitted values or other relevant variables. Use: weights = var...( form = ...) to specify a model in which the variance changes.
      2. Correlated residuals over time or space: plot the semi-variogram. Use: correlation = cor...( form = ...) to specify a model for within cluster correlation.

Notes on building packages in R

  1. Installing the toolset:
    1. Read:
  1. Used google html cache of Johh's which wasn't available
  2. Installed Rtools 2.9 from
  3. Installed latest version of R

Notes on HLMs

  • "An Approach to Estimate Between- and Within-Group Correlation Coefficients in Multicenter Studies: Plasma Carotenoids as Biomarkers of Intake of Fruits and Vegetables," by Pietro Ferrari1, et al. American Journal of Epidemiology Advance Access originally published online on August 10, 2005

American Journal of Epidemiology 2005 162(6):591-598; doi:10.1093/aje/kwi242

uses correlation formulas in Snijders and Boskers
How Do Academic Departments Impact Student Satisfaction? Understanding the Contextual Effects of Departments
Journal Research in Higher Education
Paul D. Umbach1 and Stephen R. Porter2

Evaluation Review, Vol. 30, No. 1, 66-85 (2006) DOI: 10.1177/0193841X05275649

Centering or Not Centering in Multilevel Models? The Role of the Group Mean and the Assessment of Group Effects
Omar Paccagnella
University of Padua, Italy
In multilevel regression, centering the model variables produces effects that are different and sometimes unexpected compared with those in traditional regression analysis. In this article, the main contributions in terms of meaning, assumptions, and effects underlying a multilevel centering solution are reviewed, emphasizing advantages and critiques of this approach. In addition, in the spirit of Manski, contextual and correlated effects in a multilevel framework are defined to detect group effects. It is shown that the decision of centering in a multilevel analysis depends on the way the variables are centered, on whether the model has been specified with or without cross-level terms and group means, and on the purposes of the specific analysis.
Key Words: multilevel model • group mean centering • contextual and correlated effects • collinearity • school effectiveness


  • explore scagnostics in R
  • explore mapply, Vectorize

Quick links

Department links

2006 Committee assignments (
2006-2007 Course assignments (
User:Georges Tenure and Promotion Procedure

York links

Fall 2006 Exam Schedule (
Phone and e-mail Directory (
[1] (
High School math requirements for admission to Ontario Universities (



Please click on the 'discussion' tab above

Statistics Hiring 2006-07
Test whether latest version has been installed yet:
Failed to parse (unknown function \begin): \begin{align}
f(x) & = (a+b)^2 \\
     & = a^2+2ab+b^2 \\


Spatial statistics

Vignette for spBayes:
R Spatial project:
'Splancs' index with some functions for space-time modeling:
pastecs: Package for Analysis of Space-Time Ecological Series
Ontario lotteries
Susan Nelles
Statistical case histories: Susan Nelles
Graphics as such
Graphics to visualize fitted models
Exercise: contribute to a catalog of basic types, excellent for a wiki with sample output in thumbnail, then an advanced catalog
Show the latest: e.g. Gore, show World Health Presentation
Write a function with two arguments that tests for equality where both value being 'NA' counts as equal. Be sure to treat factors appropriately.
Prepare a tutorial on graphics in "Hmisc"
Prepare a tutorial on 3-d graphics and develop applications to diagnostics.
> require(RODBC)
> channel <- odbcConnectExcel("f:/teste.xls")
> data <- sqlFetch(channel, "Sheet1")
>   summary(data)
>         qw              ee
>   Min.   :1.000   Min.   :1.000
>   1st Qu.:1.000   1st Qu.:1.500
>   Median :1.000   Median :2.000
>   Mean   :1.333   Mean   :2.429
>   3rd Qu.:1.750   3rd Qu.:3.500
>   Max.   :2.000   Max.   :4.000
>   NA's   :1.000

One idea from the R mailing list: '''But it doesn't work with more that 256 variables.'''
> I save my data(frames) in csv format, which can be opened by any
> spreadsheet application:
> R> write.table( myData, "myFile.csv", col.names = NA, sep = "," )

Or you can write it as

write.table(, "excel.file.xls", sep="\t", na="", row.names=F)

which I can usually open in Excel just by clicking on it. 

Faraway chap. 7 ques 3 has an interesting example that requires manipulating the data frame so it's right for regression.
Faraway Chap. 7 # 5: reprogramming R to fit GLM model with different variance
Chapter 6, no 5: COnway Maxwell Poisson distribution: implement in R
Topics in 6630: multiple comparisons using multcomp
Go through book and prepare detailed syllabus ahead of time, prepare all topics, references and assignments ahead of time. Use 6140 assignments for linear stuff.
6630: Should I add EM, MCMC perhaps from "all of statistics".
Include, at right place in Fox, a discussion of paradoxes of regression with elliptical explication
Start with intensive R tutorial, perhaps with presentations

Mixed models

6627 Syllabus?

Gordon Crowe
Categorical DV with multiple classes
Longitudinal: maybe
Setting up a server with a small data base
Participation in consultations
See Design package for pairwise comparisons?
Derr tape
Discuss and plan re SCS involvement
Working environment: OpenOffice (in-line math lousy, can it be improved?), R
Stress reports and (timed) presentations: lots of early group work, perhaps using Statistics for Lawyers. Idea that they will present dry run to statisticians, then review for final report.
Each student should develop their own R fun.R to present and discuss periodically.
Arrange for visitors get students to present and write a brief overview and tutorial
Prepare a talk and demo on reshape :
Prepare a talk and demo on ggplot
Causality: Naive, Mediator variables, SEMs, Rubin's causal model, DAGs. (Wasserman for overview?)
Select problems from: Statistics for Lawyers, Harrell on model selection and validity for prediction models.

See about π

6630 Objectives

Some objectives for 6630:



MATH_6630: Questions


  • marginality
  • simultaneity
  • observational/experimental and causality
  • validation
  • Fallacies
    • testing many effects in one table


Statistics links





Selected pages:


Team names

Some of these have already been used:

Mahalanobis Rao Robbins Savage Shewhart Sagarin Snedecor Spearman Taguchi Thiele Tukey Wilks

Here is where we want the graph:


> attach(dd)
> x
 [1] 1 2 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[39] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[77] 4 4 4 4 4

Causal session proposal

Title: Beyond "Correlation is not Causation"

Organizer: Georges Monette, Mathematics and Statistics, York University

Chair/Discussant: TBD


Most introductory textbooks in statistics do not discuss concepts causal ideas beyond asserting that correlation is not causation.

This cursory treatment of causation leaves students unequipped to make sense of the barrage of public controversies revolving around causal claims based on non-randomized data. Some years ago, many statisticians would have rigidly held that the only responsible position is to firmly insist that causal inference in impossible without randomization.

Recently our discipline has made great strides in offering constructive insights about causal inference with observational, non-experimental, data. These ideas are important not only in applications to research but, perhaps even more importantly, in informing the public understanding of the myriads of controversies that center on questions of determining causal relationships. Donald Rubin in a recent article observes "... decisions about interventions must be made, even if based on limited empirical evidence, and we should help decision makers make sensible choices ..." (Rubin, 2015)

There are many current developments showing a new direction in our discipline: a new journal, "Observational Studies" had its first issue in 2015 and the 2013 Joint Statistical Meetings in Montreal focused on the theme of causal inference.

This proposed session would invite three speakers who are integrating an understanding of causality in introductory and non-technical settings.


Option 2: Three speakers each present for 25 minutes, followed by a short panel discussion facilitated by Chair.

Speakers and tentative titles:

Tina Grotzer, Harvard School of Education has worked on a number of projects on introducing causal understanding in the curriculum.

Maya Petersen or Laura Balzer, UCLA, winners of the 2014 ASA prize in Causality in Statistical Education. The citation says that they have "prepared a new generation of scientists, who have acquired the tools of modern causal analysis and are equipped to tackle each step of the causal roadmap."

Erica Moodie, Mc Gill University, who has organized a number of conferences on applications of causal inference.