User:Georges
From MathWiki
Latex for MATH 6635
a_{1} = λ_{1}γ_{1} and a_{2} = λ_{2}γ_{2} should have been and respectively. With Λ = diag(λ_{1},λ_{2}), this yields Σ = ΓΛ^{1 / 2}Λ^{1 / 2}Γ' = ΓΛΓ'
Current projects
Quick Links
- [NATS 1500 blackwell (http://blackwell.math.yorku.ca/NATS1500/2016/)]
- /Statistics course descriptions
MATH 4939
- Registrar Course Page (https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/18/wo/AVbEOG9UFrDo51QtQd5hKM/4.1.10.8.3.101.0.5)
- MATH4939 Talk:MATH4939
MATH 2565
NATS 1500
What 2014W 2013W 2012W 2009-10 Wiki NATS 1500 2014W NATS_1500_2013W NATS_1500_2012W Old_Courses:_NATS_1500_2009-10 Files Files (http://blackwell.math.yorku.ca/Files/NATS1500/) Files (http://www.math.yorku.ca/people/georges/Files/NATS1500/) York Class list list (https://w4prod.sis.yorku.ca/Apps/WebObjects/GAM.woa/4/wo/4tRdrDaTWAN1WvvZowBttw/2.3.21.3.5.1.1.0.21.1) Assn 0 Assignment 0 (https://docs.google.com/spreadsheet/embeddedform?formkey=dFFZQnd3OGFNOEdLWU1pbWZ2Ty12VXc6MQ) NATS_1500_2012W#Assignment_0
Things to add for NATS 1500 2013-2014W
*1. NATS LibGuide <http://researchguides.library.yorku.ca/nats> and STS LibGuide <http://researchguides.library.yorku.ca/sts%20Print%20Guide>* Please be sure to let students know of this resources on course Moodle pages, in class, and maybe even as a resource in your syllabus. It will be especially important for first year students. You can access the guide from the library homepage by going to the Research Guides drop down menu and selecting Natural Sciences. The websites URL are : http://researchguides.library.yorku.ca/nats http://researchguides.library.yorku.ca/sts *2. Information Literacy Workshops* Library information literacy classes can be held either in class or at the library in Steacie's computer lab. Library instruction is tailored towards assignments and course subject matter. They are usually 1 hour classes though I can be flexible with timing. Additionally, library sessions are meant to assist students with learning how to develop a research question/topic, search strategies, advanced research, identifying primary sources, Refworks, citation, and academic integrity. Please let me know if you have any questions and/or you would like to book a session. Cheers, Sarah
Links
- Wikis:
- Departmental Wiki (http://wiki.math.yorku.ca)
- Stats Wiki (http://statswiki.math.yorku.ca)
- SCS Wiki (http://scs.math.yorku.ca)
- SORA Wiki (http://sora.math.yorku.ca)
- Home Page (http://www.math.yorku.ca/~georges)
- Math
- Department room bookings (http://rbs.math.yorku.ca)
- Department private page (http://www.math.yorku.ca/private_page.html)
- Chair's page (http://www.math.yorku.ca/Chair/menu.html)
- Statistics Wiki (http://statswiki.math.yorku.ca/)
- Actuarial Planning
- York Finance
- Per diem travel allowances: http://www.yorku.ca/finance/documents/PerDiemMay2007.pdf
Georges Monette
- Office: N626 Ross
- Phone: 416 736 2100 ext 77164
- E-mail: mailto:georges@yorku.ca
Table of contents |
Notes for MATH 1532
- Introduction to Statistics using Calc (http://www.comfsm.fm/~dleeling/statistics/notes000.html)
Excel
- Histograms (http://www.treeplan.com/BetterHistogram_20041117_1555.htm)
- Correlation with Excel (http://cameron.econ.ucdavis.edu/excel/ex51correlation.html)
Test including a graphics file
Causality
http://en.wikipedia.org/w/index.php?title=Experiment&oldid=346357605
Suumer 2009 Projects
- Implement Satterthwaite for 'wald'
- Implement influence diagnostics for 'lme'
- Consider local influence: Verbeke and Molenberghs (2000) pp167ff.
- Improve documentation for spida and p3d
- Add hccm to wald
Notes on model building and diagnostics with MMs
- Explore science, formulate formal hypotheses
- Explore data at various levels of aggregation and with various dimensions: 1d, 2d, and higher.
- Explore science ... iterate with previous a few times, formulate exploratory hypotheses (one study's exploratory hypothesis is the next study's formal hypothesis)
- Building a model
- Start with the science: what variables are needed as controls, what variables should be omitted as possible mediators.
- With many single level models, e.g. homoscedastic GLMs, the estimation of the mean model and of the variance model are largely orthogonal so one does not depend on the other. In contrast, with mixed models the variance model affects the mean model and vice versa. Thus it's necessary to iterate to some extent.
- There are two components: the FE model for beta and the RE model with two parts: the 'G' matrix for random effects between clusters and the 'R' matrix for the variance of error within clusters.
- Start with an FE model that is large but not beyond the capacity of the data to produce a valid model. In OLS, for validity, some authors (e.g. Harrell) propose that there should be at least 10 to 15 observations per parameter. Considering the relationship between the number of observations and the number of parameters Level 1 effects have somewhat less than the total number of observation less the number of clusters. The Level 2 effects have somewhat less than the number of clusters. The random effects, of which there can be as many as Level 1 effects plus 1 for the intercept, can be thought of as multivariate data with one observation (not a very good one) per cluster. Estimating the variance of this multivariate data takes p(p+)/2 variance and covariance parameters. Bear these factors in mind as you decide on the complexity of the preliminary FE and RE (just G for now) models.
- If the OLS model can be estimated in each cluster use lmList to fit it.
- Get the residuals and fitted coefficients ( coef( fit.list) ). Plot the residuals to find possible outliers, lack of fit (e.g.curvilinearity) and heteroskedasticity (plot the square root of the absolute residual against the fitted value and various predictors to look for changes in variance and outliers. If there is evidence of heteroskedasticity, you may consider variance-stabilizing transformations of the response (if they don't introduce non-linearity) or fitting a mixed model that incorporates heteroskedasticity with the 'weights = var...' argument to the fitting function.
- Plot the fitted coefficients to look for outliers, singularity of their distribution, relationships with Level 2 variables.
- Formulate and fit the initial model. If the initial model does not converge because it reached the iteration limit, increase the number of iterations and use verbose mode.
- control = list( msMaxIter = 200, msVerbose = TRUE, msMaxEval = 500, returnObject = TRUE)
- If the model now fails to convergee with singular non-convergence:
- Rule out near singularity of the FE model.
- Then look at the structure of the random effects. Their variability is likely to be rank deficient.
- Identify the nature of the deficiency to simplify the G model.
- Try to refit the RE model with the same variables but centered within group (using 'dvar' in spida).
- If this does not work, try to refit with a smaller G model guided by the variability of BLUPS shown by the variance of ranef(fit). If ranef(.) has two or three columns, visualize it with an appropriate plot: e.g. for 3 columns:
- > library(p3d)
- > Plot3d( ranef(fit) )
- where rotating the point cloud is likely to reveal that it is almost in a 2- or 1-dimensional subspace.
- Iterate between the FE model and the G model performing influence diagnostics.
- Test whether a more complex R model is needed. There are two main possibilities:
- Heteroskedasticity revealed by plotting Level 1 residuals ( or square roots of absolute residuals ) against fitted values or other relevant variables. Use: weights = var...( form = ...) to specify a model in which the variance changes.
- Correlated residuals over time or space: plot the semi-variogram. Use: correlation = cor...( form = ...) to specify a model for within cluster correlation.
Notes on building packages in R
- Installing the toolset:
- Used google html cache of Johh's http://socserv.mcmaster.ca/jfox/Courses/R-course/Topic-7-notes.pdf which wasn't available
- Installed Rtools 2.9 from http://www.murdoch-sutherland.com/Rtools/
- Installed latest version of R
Notes on HLMs
- http://aje.oxfordjournals.org/cgi/content/full/162/6/591 "An Approach to Estimate Between- and Within-Group Correlation Coefficients in Multicenter Studies: Plasma Carotenoids as Biomarkers of Intake of Fruits and Vegetables," by Pietro Ferrari1, et al. American Journal of Epidemiology Advance Access originally published online on August 10, 2005
American Journal of Epidemiology 2005 162(6):591-598; doi:10.1093/aje/kwi242
- uses correlation formulas in Snijders and Boskers
- How Do Academic Departments Impact Student Satisfaction? Understanding the Contextual Effects of Departments
- Journal Research in Higher Education
- Paul D. Umbach1 and Stephen R. Porter2
Evaluation Review, Vol. 30, No. 1, 66-85 (2006) DOI: 10.1177/0193841X05275649
- Centering or Not Centering in Multilevel Models? The Role of the Group Mean and the Assessment of Group Effects
- Omar Paccagnella
- University of Padua, Italy
- In multilevel regression, centering the model variables produces effects that are different and sometimes unexpected compared with those in traditional regression analysis. In this article, the main contributions in terms of meaning, assumptions, and effects underlying a multilevel centering solution are reviewed, emphasizing advantages and critiques of this approach. In addition, in the spirit of Manski, contextual and correlated effects in a multilevel framework are defined to detect group effects. It is shown that the decision of centering in a multilevel analysis depends on the way the variables are centered, on whether the model has been specified with or without cross-level terms and group means, and on the purposes of the specific analysis.
- Key Words: multilevel model • group mean centering • contextual and correlated effects • collinearity • school effectiveness
Plans
- explore scagnostics in R
- explore mapply, Vectorize
Quick links
- MATH_2565_W_2007_Section_M
- World Health Organization Core Health Indicators (http://www3.who.int/whosis/core/core_select.cfm)
- Department constitution
Department links
- 2006 Committee assignments (http://www.math.yorku.ca/new/people/committee.htm)
- 2006-2007 Course assignments (http://www.math.yorku.ca/Chair/FW06_web.htm)
- User:Georges Tenure and Promotion Procedure
York links
- Fall 2006 Exam Schedule (https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm)
- Phone and e-mail Directory (http://starcraft.ccs.yorku.ca/atlas/servlet/atlas)
- [1] (http://www.yorku.ca)
- High School math requirements for admission to Ontario Universities (http://www.electronicinfo.ca/html/english/2008/index.html)
ρ_{xy}
Notes
Please click on the 'discussion' tab above
- Statistics Hiring 2006-07
- Test whether latest version has been installed yet:
- Failed to parse (unknown function \begin): \begin{align}
f(x) & = (a+b)^2 \\ & = a^2+2ab+b^2 \\
\end{align}
Spatial statistics
- Vignette for spBayes: http://blue.fr.umn.edu/spatialBayes/spBayes-vignette.pdf
- R Spatial project: http://sal.uiuc.edu/csiss/Rgeo//
- spatstat: http://www.jstatsoft.org/v12/i06/v12i06.pdf
- 'Splancs' index with some functions for space-time modeling: http://cran.r-project.org/src/contrib/Descriptions/splancs.INDEX
- pastecs: Package for Analysis of Space-Time Ecological Series
- Lotteries
- Susan Nelles
- Graphics
- Graphics as such
- Graphics to visualize fitted models
- Exercise: contribute to a catalog of basic types, excellent for a wiki with sample output in thumbnail, then an advanced catalog
- Show the latest: e.g. Gore, show World Health Presentation
- Write a function with two arguments that tests for equality where both value being 'NA' counts as equal. Be sure to treat factors appropriately.
- Prepare a tutorial on graphics in "Hmisc"
- Prepare a tutorial on 3-d graphics and develop applications to diagnostics.
- Use RODBC
> require(RODBC) > channel <- odbcConnectExcel("f:/teste.xls") > data <- sqlFetch(channel, "Sheet1") > summary(data) > qw ee > Min. :1.000 Min. :1.000 > 1st Qu.:1.000 1st Qu.:1.500 > Median :1.000 Median :2.000 > Mean :1.333 Mean :2.429 > 3rd Qu.:1.750 3rd Qu.:3.500 > Max. :2.000 Max. :4.000 > NA's :1.000 One idea from the R mailing list: '''But it doesn't work with more that 256 variables.''' > I save my data(frames) in csv format, which can be opened by any > spreadsheet application: > > R> write.table( myData, "myFile.csv", col.names = NA, sep = "," ) Or you can write it as write.table(r.data.frame, "excel.file.xls", sep="\t", na="", row.names=F) which I can usually open in Excel just by clicking on it.
- Faraway chap. 7 ques 3 has an interesting example that requires manipulating the data frame so it's right for regression.
- Faraway Chap. 7 # 5: reprogramming R to fit GLM model with different variance
- Chapter 6, no 5: COnway Maxwell Poisson distribution: implement in R
- Topics in 6630: multiple comparisons using multcomp
- Go through book and prepare detailed syllabus ahead of time, prepare all topics, references and assignments ahead of time. Use 6140 assignments for linear stuff.
- 6630: Should I add EM, MCMC perhaps from "all of statistics".
- Include, at right place in Fox, a discussion of paradoxes of regression with elliptical explication
- Start with intensive R tutorial, perhaps with presentations
Mixed models
- David Garsons's NCSU course in intermediate statistical methods for public administration with good overview of methods (http://www2.chass.ncsu.edu/garson/pa765/pa765syl.htm)
- See glmmADMB: http://otter-rsch.com/admbre/examples/glmmadmb/glmmADMB.html
- PROC GLIMMIX: http://www.ats.ucla.edu/STAT/sas/glimmix.pdf
- mgcv package with 'gamm' (gam models with random effects based on glmmPQL) http://cran.r-project.org/doc/packages/mgcv.pdf
- nlme http://cran.r-project.org/doc/packages/nlme.pdf
6627 Syllabus?
- Plans
- Gordon Crowe
- Categorical DV with multiple classes
- Longitudinal: maybe
- Setting up a server with a small data base
- Participation in consultations
- Other
- See Design package for pairwise comparisons?
- Derr tape
- Discuss and plan re SCS involvement
- Working environment: OpenOffice (in-line math lousy, can it be improved?), R
- Stress reports and (timed) presentations: lots of early group work, perhaps using Statistics for Lawyers. Idea that they will present dry run to statisticians, then review for final report.
- Each student should develop their own R fun.R to present and discuss periodically.
- Arrange for visitors
- Gapminder.com: get students to present and write a brief overview and tutorial
- Prepare a talk and demo on reshape : http://had.co.nz/reshape/
- Prepare a talk and demo on ggplot
- Imputation
- Causality: Naive, Mediator variables, SEMs, Rubin's causal model, DAGs. (Wasserman for overview?)
- Select problems from: Statistics for Lawyers, Harrell on model selection and validity for prediction models.
See
- http://cran.r-project.org/doc/manuals/R-data.html
- http://pi.ytmnd.com/ about π
- http://ocw.mit.edu/OcwWeb/Mathematics/18-465Spring-2004/CourseHome/index.htm
6630 Objectives
Some objectives for 6630:
Questions:
Concepts:
- marginality
- simultaneity
- observational/experimental and causality
- validation
- Fallacies
- testing many effects in one table
R
- http://cran.r-project.org/doc/manuals/R-intro.pdf
- Contributed documentation: http://www.maths.bris.ac.uk/R/other-docs.html
- The R Guide (version 2.2) by W. J. Owen: http://www.mathcs.richmond.edu/~wowen/TheRGuide.pdf
- MATH 6630: Potential topics
Statistics links
- Free online surveys: http://freeonlinesurveys.com/
Research
SCS
- CBC Interviews at 2010 SSC meetings in Quebec City (http://www.radio-canada.ca/audio-video/pop.shtml#urlMedia=http://www.radio-canada.ca/Medianet/2010/CBF/LesAnneeslumiere201005301215_1.asx)
Admin
Links
Selected pages:
- User:georges Notes for 6630
- World cigarette consumption (http://www.who.int/tobacco/en/atlas8.pdf)
Team names
Some of these have already been used:
Mahalanobis Rao Robbins Savage Shewhart Sagarin Snedecor Spearman Taguchi Thiele Tukey Wilks
Here is where we want the graph:
> attach(dd) > x [1] 1 2 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [39] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [77] 4 4 4 4 4 >
Causal session proposal
Title: Beyond "Correlation is not Causation"
Organizer: Georges Monette, Mathematics and Statistics, York University
Chair/Discussant: TBD
Description:
Most introductory textbooks in statistics do not discuss concepts causal ideas beyond asserting that correlation is not causation.
This cursory treatment of causation leaves students unequipped to make sense of the barrage of public controversies revolving around causal claims based on non-randomized data. Some years ago, many statisticians would have rigidly held that the only responsible position is to firmly insist that causal inference in impossible without randomization.
Recently our discipline has made great strides in offering constructive insights about causal inference with observational, non-experimental, data. These ideas are important not only in applications to research but, perhaps even more importantly, in informing the public understanding of the myriads of controversies that center on questions of determining causal relationships. Donald Rubin in a recent article observes "... decisions about interventions must be made, even if based on limited empirical evidence, and we should help decision makers make sensible choices ..." (Rubin, 2015)
There are many current developments showing a new direction in our discipline: a new journal, "Observational Studies" had its first issue in 2015 and the 2013 Joint Statistical Meetings in Montreal focused on the theme of causal inference.
This proposed session would invite three speakers who are integrating an understanding of causality in introductory and non-technical settings.
Format:
Option 2: Three speakers each present for 25 minutes, followed by a short panel discussion facilitated by Chair.
Speakers and tentative titles:
Tina Grotzer, Harvard School of Education has worked on a number of projects on introducing causal understanding in the curriculum.
Maya Petersen or Laura Balzer, UCLA, winners of the 2014 ASA prize in Causality in Statistical Education. The citation says that they have "prepared a new generation of scientists, who have acquired the tools of modern causal analysis and are equipped to tackle each step of the causal roadmap."
Erica Moodie, Mc Gill University, who has organized a number of conferences on applications of causal inference.