MATH 2565 W 2014
From MathWiki
News and Links
Breaking News
Assignment 3
 The due date for Assignment 4 is deferred to April 4
Classroom (SLH B) available on Tuesdays 11:30 am to 12:30
I have reserved our classroom for the hour following the Tuesday lecture. This will make easier for groups to meet if they wish and I can stay in class when I'm otherwise free to answer questions.
Tutors who can help with MATH 2565
There is a 'StatsLab' in South 525 Ross open Monday to Friday 10:30am to 3:30pm
The content of our course is unusual for a statistics course and some tutors might have difficulty with your questions. The following tutors are familiar with MATH 2565 and you should consider speaking to them if other tutors can't help you:
 Nanwei Wang: Monday 1:30  2:30, Wednesday 12:30  3:30
 Dongwei Wei: Wednesday 10:30  1:30
 Hangjing Wang: Thursday 12:30  3:30
Textbook
The textbook for this section is different from the one used for other recent sections of MATH 2565. We will be using
 Daniel Kaplan (2011) Statistical Modeling: A Fresh Approach, 2nd ed. (http://www.mosaicweb.org/go/StatisticalModeling/) available at the bookstore and through Amazon, along with exercises published as a pdf file available online: Daniel Kaplan (2012) Exercises for Statistical Modeling: A Fresh Approach (userid: stats, password: stats) (http://blackwell.math.yorku.ca/Files/MATH2565/Kaplan/Kaplan%20ExerciseCollection.pdf)
Moodle
 MOODLE SITE for MATH 2565 (https://moodle.yorku.ca/moodle/course/view.php?id=16142)
 We will use Moodle for four purposes:
 A forum in which you can post questions, comments and answers related to the course and to Statistics in general.
 A forum entitled Statistics in the News in which you post interesting things you have found and comment on the posts of others.
 A way for you to submit assignments.
 A way for you to find out your grades.
 We will use Moodle for four purposes:
Log in so you can start contributing posts and comments.
Calendar
If you are prompted for a userid and password to access a file, use 'stats' for both.
Week  Date (2014)  Links and files 

1  Jan 7 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_07.swf) 
Topic 1: Introduction to Statistical Ideas 
Jan 9 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_09.swf)  Install R and RStudio (http://scs.math.yorku.ca/index.php/R:_Getting_started_with_R) Assignment 0 due Jan 10  
2  Jan 14 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_14.html)  We start with: Hans Rosling and global health in preparation for Assignment 1 and then resume where we left off last Thursday. 
Jan 16 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_16.swf)  
3  Jan 21 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_21.swf)  Preparation for this class:
Special topics covered:

Jan 23 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_23.swf)  Preparation for this class:
Files:
 
4  Jan 28 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_28.swf) 

Jan 30 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_30.swf) 
 
5  Feb 4 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_04.swf)  Assignment 2 due Feb 4 
Feb 6 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_06.swf)  Topics:
 
6  Feb 11 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_11.swf) 

Feb 13  Midterm test
 
Feb 18/20  Reading Week  
7  Feb 25 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_25.swf) 

Feb 27  Cancelled  
8  Mar 4 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_04.swf)  Assignment 3 due Mar

Mar 6 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_06.swf)  
9  Mar 11 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_11.swf) 

Mar 13 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_13.swf) 
 
10  Mar 18 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_18.swf)  Assignment 3 due Mar 18 
Mar 20 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_20.swf)  Visualizing Correlation with notes (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=443274)  
11  Mar 25 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_25.swf)  Summary of formulas on regression 
Mar 27 video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_27.swf)  
12  April 1 

Apr 3  Inadvertently, this class was not recorded. The following link has the annotations shown on the screen.
 
Final exam  Apr 8 Apr 24  Exact date and time should be announced by the Registrar before early March 
General Information
How do you know what you know? Why do you feel very confident that some things are true but you feel less sure about others? Do you feel very sure about some things that, perhaps, you shouldnâ€™t be so sure about? And unsure about things that you should, in fact, be confident of.
Statistical reasoning is crucial for a critical understanding of the flood of data and information we face daily in modern society. Understanding the principles of statistical reasoning and being aware of a number of widespread errors in statistical thinking is often the key for distinguishing arguments that are sound from those that are fallacious.
This course stresses the logic and reasoning behind statistics. We avoid complex mathematical formulas. Statistical reasoning is applied to a critical analysis of current events reported in the media and current scientific, medical and social controversies.
By the end of course, you will have developed an understanding of the reasons why scientific evidence can appear to lead to contradictory conclusions. You will have a better understanding of the assumptions that lead to these different conclusions and you will be in a better position of have informed judgments on the quality of scientific claims.
Instructor
 Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)
 North 626 Ross
 mailto:georges+math2565@yorku.ca (Note: the "+math2565" portion is designed to help avoid spam filters)
 Phone: (416) 7362100 ext. 77164
 http://georges.blog.yorku.ca
 Office hours: Fridays 8:30 am to 10:30 am during the academic term: January to April, 2014, other times by appointment.
Assignments and Tests
 Dates for MATH 2565: Assignments are due at 11:55pm on the date shown unless otherwise indicated
Assigned Due Link Weight Assignment 0 (individual) Jan. 7 Friday Jan. 10 noon Assignment 0 1% Assignment 1 (team) Jan. 9 Jan. 2124Assignment 1 67%Assignment 2 (team) Jan. 23 Feb. 4 Assignment 2 67%Midterm test
Feb. 13 In SLH B: AJ and in CLH K: KZ 30% Assignment 3 (team) Feb. 13 March 418Assignment 3 68%Assignment 4 (team)  April 4 Assignment 4 68%Assignment 5 (team) cancelled   60%Final exam April 8  24 35% Participation Jan. 7  April 7 Class participation and contributions to Moodle open forums 4%
 Important academic dates from the York website (http://www.registrar.yorku.ca/enrol/dates/fw13.htm)
Textbook
Daniel Kaplan (2011) Statistical Modeling: A Fresh Approach, 2nd. ed. (available at the York bookstore)
Daniel Kaplan (2012) Exercises for Statistical Modeling: A Fresh Approach (available online: userid: stats, password: stats) (http://blackwell.math.yorku.ca/Files/MATH2565/Kaplan/Kaplan%20ExerciseCollection.pdf)
This is a textbook that takes a radically different approach to introductory statistics. Most textbooks build the subject, mathematically, from the ground up. This book focuses on the larger concepts. It avoids formulas but it stresses using a free software package, R, to give you the experience of actually using statistics to address real questions.
In contrast with most textbooks, this book is not slick and has few graphs and pictures. However, it is much less expensive. Don't worry about the lack of pictures  they really drive up the cost of a textbook  you will get to produce them yourself using R, a leading statistical language for data analysis and graphics.
Lectures and Tutorials
 Class: Tuesdays and Thursdays, 10 am to 11:30 am in Steadman Lecture Halls (SLH) B. The first class takes place on Tuesday, January 7, 2014.
 Optional tutorials: Occasional tutorials will be held as needed at times to be determined. The purpose of the tutorials will be to help you with problems using computers and to discuss questions regarding the material of the course. The days on which tutorials will be held will be announced in advanced.
Course Policies
 Late assignments
 Late assignments or projects are penalized 10% of the value of the assignment for each day (or portion of a day) it is late. Unless a time is specified, assignments and projects are due before 11:55 pm.
 Missed term test
 If you miss the term test with a suitably documented medical or compassionate reason, your mark for the term test will be imputed from your mark on the final exam. Otherwise you receive a grade of zero for the mid term.
 Use of computers in class
 You are encouraged to bring your laptop to class to use it for purposes directly related to the class such as taking notes, annotating slides posted on the web or trying out commands in R. Some students think that it does not affect anyone else if you are doing your own thing in class on your laptop or other electronic device. This is wrong. People seated around you cannot help but be distracted. Therefore, you may not use your laptop to view unrelated materials such as videos because this creates a visual distraction for students seated near you. Failure to observe this policy may result in warnings, may have an impact on your class participation mark and may result in being asked to leave the classroom.
 Demeanor in class
 It is okay to turn momentarily to your neighbour in class to quietly ask a brief clarifying question related to the material in the class and it is okay to give a quiet and brief answer. You may not, however, have any conversation beyond a very brief and quiet exchange. The instructor may be so absorbed in what he is saying that he won't notice you. Other students, who may be struggling to remain absorbed, do notice and are very distracted by conversations. They will be annoyed and many will come and complain to me for my failure to enforce adequate discipline. Please don't put me in this awkward position.
 Academic honesty
 Familiarize yourself with the York University Senate Policy on Academic Honesty (http://www.yorku.ca/univsec/policies/document.php?document=69). Violations of academic honesty are treated very seriously in university.
Resources
 Datasets and lecture notes will be posted at http://blackwell.math.yorku.ca/Files/MATH2565. Since some of the material may be copyrighted, access to the files is protected and requires a userid: 'stats' as well as a password: 'stats' also.
 When you find interesting links on the web you should post them on the appropriate Moodle forum.
Team Assignments
Assignments are done by semirandomly assigned teams. Why random teams? One reason is that in almost all job interviews, you are asked about your experience working with teams. Working with a diverse team that you didn't select yourself gives you the opportunity to have experiences that will give you great anecdotes to use in your future job interviews. When you land the job, you will be much more likely to show the kind of leadership in team work that is invaluable in the modern workplace.
 General comments and details
 I will email the list of members in your team some time during the weekend of January 11. The members of your team can communicate by email, meet in person, and use the special Team Forum on the wiki which is visible only to members of your team and to the instructor and TAs but typically only on invitation from a member of the team.
 All team assignments are due at 11:55pm on Tuesdays. Meet with your team after the class on Tuesday to finalize your submission so you only need to do some proofreading and merging before the deadline at midnight.
 Include the names of all active participants on the first page of the assignment. Everyone who participated actively gets the same grade. Those who didn't, get zero. Note that some team members might not respond because they have dropped  or intend to drop  the course. If your team shrinks to 3 or fewer, let me know and I can merge your team with another smaller team.
 The more work you do on an assignment the better prepared you are to do well on the midterm and on the final. But you shouldn't hog the work  let others do their part too. Everyone should make sure that they understand the whole assignment. Discuss the assignment with your team members to make sure everyone understands the key points and difficulties of each question.
Class representatives
In approximately 2 weeks, we will select 3 class representatives. The class representatives can:
 help organize group study sessions,
 meet occasionally with the instructor to provide feedback on the course.
Using computers for the course
Using the software package R is a critical component of the course. The best place to use R is on your own laptop or desktop. The skills you learn on your own computer you will be able to continue using after the course. If you don't have access to a computer you can use a computer in the Gauss Lab.
Some assignments and the individual project will require you to use computing software to view and analyze data. The test and exam will require you to interpret output from the same software. You can learn the computing aspects of the course in a number of ways:
 If you have access to a computer, you can download the software for the course. We use public domain software that runs on Windows, MacOS X or Linux. If you have a laptop, you are encouraged to bring it to class and to tutorials and office hours.
 If you don't have access to a computer, you can get an account to use computers in the Gauss Lab where the software will be available. You also need to get a card to get access to the Gauss Lab.
 The course will show detailed examples of simple statistical analyses using R.
Topic 1: Introduction to Statistical Ideas
Material covered
 Introduction to the course
 How do we know what we know: Links to slides and exercises will be posted soon
What is 'Statistics'?
 Many textbooks give a definition like the following:
Definition: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.
 This is certainly an important aspect of statistics but I think it only tells a small part of the story. Statistics is the science (and art) of working with uncertainty  whether you plan to make decisions or not. We tend to think of statements as true or false. But in practice, the truth or falsity of most important statements is not known with certainty. There are all shades of degrees of uncertainty between between being certain a statement is true or false. Many of the most important decisions and choices we make in life are made despite the fact that we don't have all the information we would like to have to determine which route is best. Sometime we simply act as if something is true or false although we don't really know. Statistics is not just about how to make these difficult decisions. It is also about remembering and being aware of our uncertainty so we know where to look for better information and how to revise our hypotheses as relevant information becomes available. Statistics is not just about making decisions, it's about where to look for information that could lead us to change our decisions. It's about knowing when to keep an open mind and knowing when and how to change your mind.
 Statistics is about the fascinating journey from ignorance to increasingly certain knowledge to wisdom. This is a journey we all follow individually. It is also a journey undertaken by disciplines, by political and social organisms and by mankind as a whole.
Possible test question
A traditional definition of statistics is that it is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty. Give a brief critique of this definition (100 to 300 words).
Experimental vs Observational data sets
 If X and Y are correlated, what can it mean?
 1) X causes Y?
 2) Y causes X?
 3) Another variable(s) Z(s) causes both X and Y?
 a) Some Zs might be known and measurable. For these Zs we might be able to adjust using sophisticated statistical methods.
 b) Some Zs might be known but hard or impossible to measure. This is more difficult to deal with.
 c) Some Zs might not be discovered until the year 3000. We can't adjust statistically for these.
 4) Selection: maybe there's no relationship but some data got thrown out or ignored and the data left created the impression of a relationship.
 5) Chance: This is the one statisticians are really good at dealing with  as you will learn in this course.
 If X and Y are correlated, what can it mean?
 What if we have an 'experiment' with 'random allocation of X' to experimental units?
 1) X causes Y? possible
 2) Y causes X? No! We know what caused X. It was the coin toss or the random number generator that caused X.
 3) Another variable(s) Z(s) causes both X and Y? Maybe. But it could only be by chance that differences in levels of any combination of Zs, known or not, measurable or not, would have a large impact on Y.
 4) Selection? We can exclude this by checking how the data were obtained.
 5) Chance again.
 What if we have an 'experiment' with 'random allocation of X' to experimental units?
 So, if we can exclude selection, we are left with two options:
 1) X causes Y, or
 2) Chance.
 So, if we can exclude selection, we are left with two options:
 We can use statistical analysis to measure chance. If the chance is very small then we may be left with X causes Y as the plausible explanation.
 How should you react to causal claims based on data analyses?
 1) The key question you should ask: Is the data set experimental or observational? You might have to ask questions to answer this. Generally, it isn't obvious from the appearance of the data. The critical issue is how were the levels of X assigned to the units: strictly randomly or by choice or judgement of the subjects or of the experimenters?
 2) If experimental: double check to make sure allocation was really random and not by judgment or done haphazardly? Was the study doubleblind? Are there possible biases in measurements? Psychological factors that influence outcome? Does the claim match the nature of the experiment or is the claim stretching to something that does not correspond exactly to what was done in the experiment?
 3) If observational:
 a) Can you poke an obvious hole in the claim? E.g. is there a plausible alternative explanation that was not taken into account in the analysis? In this case, you've countered the claim.
 b) What has the analysis adjusted for? Are these factors that can be measured with precision? What kinds of factors are not accounted for?
 c) Has the analysis overcorrected by controlling for possible mediating factors that should not be controlled?
 d) If a causal connection seems paradoxical, can you think of plausible mediating factors that might explain causality?
 With observational data, you can't be 100% sure that the relationship is causal but you can check whether important alternative possibilities have been adequately addressed.
 Some examples in the news: Toronto Star: Pulse (http://www.math.yorku.ca/people/georges/Files/NATS1500/Week02/StatisticsInTheNews030926.html)
 Which examples are experimental and which are observational?
 Which conclusions are reasonable and which are not? Why?
 How should you react to causal claims based on data analyses?
Things to do 1
'Things to do' are tasks that are not graded but are important to keep up with the course
 Download and install R and Rcmdr on your computer.
 Download and install RStudio (http://www.rstudio.com/ide/download/)
Assignment 0
Due: 12 noon, January 10, 2014
 I would like to know something about you and I also want to form random teams of 4 or 5 students to work on the assignments. I will use your emailed responses to this Assignment 0 to form the teams. You will receive the names of your team members on January 13 so you can meet face to face at the break during the class on January 14.
 To complete Assignment 0, fill out this survey (https://docs.google.com/forms/d/1GILvwy3V4KWb8L7_CnyHbHJp9k6XssCID4SrBTNlB8/viewform). It should take approximately 5 minutes.
Topic 2: Global Health
Hans Rosling and global health
 Hans Rosling's TED talk on changes in global health (http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html)
 The Gapminder site (http://www.gapminder.org) with links to the Gapminder software, data and more.
Summary of introductory material
Synopsis
Types of data
 Purposes for analyzing data:
 Descriptive
 Inference: causal
 Inference: predictive
 Types of data
 Experimental: X under control of experimenter: random assignment of levels of X
 Observational: X determined by other factors and just observed, not manipulated
 How purposes and types of data match
 Descriptive statistics can be done with any kind of data since there is no intention to generalize
 Causal inference is best done with experimental data
 Caution: experiments are often conducted with volunteers who may not be similar to the target population for causal inference. Often, the only true experiments may be on animals who may or may not mimic the corresponding processes in humans.
 Predictive inference is best done with observational data sampled so it is representative of the target population.
 Just as random allocation is crucial for experiments, random selection is ideal for observational data for predictive inference.
 Causal inference with observational data is highly problematic
 Often, important questions are causal in nature and all that's available is observational data.
 We can never be certain of causal conclusions based on observational data
 Intelligent evaluation of causal claims based on observational data is challenging but may be the only way to shed light on crucial questions.
 Assessing causal claims from observational data, where the relationship between X and Y is too strong to be attributed to chance:
 Look for plausible alternative explanations:
 Perhaps Y can cause X?
 Are there obvious plausible confounding factors: factors that could cause both X and Y. Note that factors that are caused by X and, in turn, cause Y are mediating factors that explain and do not contradict the possibility that X causes Y.
 Have some of these possible confounding factors been controlled for in the study? How effectively?
 Do important factors remain that have not been controlled for?
 Consider the possibility of a selection effect.
 Consider possible mediating factors that could explain how X could cause Y, even when the suggestion that X causes Y seems surprising.
 When there are different sources of data, consider which seem more reliable and why?
 What kind of data could determine whether X causes Y? Why does it not yet exist? Is it likely to be available in the future? What obstacles exist to obtaining such data?
 Can you come to a practical conclusion and how much confidence do you have in it?
 Look for plausible alternative explanations:
 Good experiments:
 Control vs treatment groups: experiments involve a comparison between two or more conditions or treatments)
 Placebos  blinding of subject
 Blinding of assessor
 If both subject and assessor are blind, we have double blind
 Randomization is crucial so we can be sure that all possible confounding factors known or unknown are not responsible for the outcome except possibly by chance'. Randomization can be applied in many ways:
 completely randomized design: take all subjects and randomly allocate to each treatment
 paired designs: for two treatments: split subjects into pairs that are similar with respect to relevant variables, then randomly select within each pair.
 blocked designs: for more than two: split subjects into similar blocks with as many subjects as treatments, then randomly assign within each block.
 longitudinal designs: give all or some of the treatments to each subject. Randomize order.
 Special types of observational studies for causal inference:
 Retrospective: (measure Y in the present or past and X in the past)
 Prospective: measure X now, Y later.
 Casecontrol: If Y is disease vs. no disease: choose a group of subjects with with the disease (the cases) and then, for each case, find a nondiseased subject who is similar with respect to selected Zs. Measure X on everyone and see if X is related to Y.
 Longitudinal without randomization: Subjects get all levels of X either in same order or in an order not controlled by experimenter.
Some important ideas not in the text are:
 the explicit list of 5 possible reasons for an association between X and Y in observational data
 the connection between these reasons and the possible reasons with experimental data
 the distinction between confounding factors and mediating factors
 using an Agresti diagram to make the connection between conditional and unconditional (marginal) association between two variables
These concepts make explicit and clarify the reasoning underlying many statements in the text.
Topics 10, 11 & 12: Visualizing Regression
Important ideas going beyond the textbook:
 Regression to the mean, the regression paradox and the regression fallacy
 Kahneman's insight on the consequences of regression to the mean and different perceptions of the effects of praise and criticism
 Visualizing regression and correlation
Summary of a few formulas
Estimating a proportion
Let p be a proportion in a sample of n from a population where the true proportion is π.
 Expected value: E(p) = π
 Variance:  this is valid if π is not too close to 0 or 1.
 Standard Error:
 Margin of Error (95% CI):
Estimating a mean
Let μ be the mean of a population with standard deviation σ
Then \bar{Y}, the mean of a sample of n:
 Expected value: the sample mean is an unbiased estimator of the population mean
 Standard Error:
 Margin of Error: for an approximate 95% CI
Estimating a slope in linear regression
Let the true linear relationship between X and Y in a population be
 Y = β_{0} + β_{1} + ε
where ε is random with mean 0 and a given variance. Let
be the fitted leastsquares equation Then
 Expected Value: E(b_{1}) = β_{1} i.e. b_{1} estimates the true slope in the population
 Standard Error:
 Margin of Error: for an approximate 95% confidence interval
Estimating a correlation
Review the visual way of estimating a correlation using a data ellipse.
Let r be estimated correlation. Then:
 Expected Value: E(r) = ρ  the sample correlation estimates the true correlation
 Approximate SE:
 Approximate ME:
Note: r is the slope of the regression of the zscore of Y on the zscore of X.
In multiple regression:
 R = correlation between Y and
 R^{2} =
Team Assignment 1
 Due: January
2124, 2014
 Explore Gapminder World (http://www.gapminder.org/world/)
 Learn how to select different variables for the Y axis, the X axis, the size and the colour of points.
 Learn how to select different subsets of countries for highlighting and how to turn 'trails' on or off.
 Learn how to control the time animation and its speed.
 Find a selection of variables that seem to tell an interesting story about a trend or a historical event that you find interesting.
 Do a bit of research on this trend or event.
 Copy the URL for the animation you selected by clicking on the 'Share graph' button and copying the URL that is shown.
 Write an interesting short essay (300 to 2,000 words) describing what the animation shows.
 Post the URL and the essay on the Moodle forum for Assignment 1 (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=398658).
 Note that each team prepares only one essay and URL. Of course, if you want to post other essays as posts to the blog, that is more than welcome!
 You should include links to online materials you used, e.g. Wikipedia articles are considered acceptable for this assignment, and you should include references to other materials that you used but there is no need for an exhaustive list of references as would be required for a more formal scholarly essay.
Possible test/exam question
Think about these questions for discussion in class
 Questions on interpreting relationships (http://blackwell.math.yorku.ca/Files/MATH%202565%20Exercises%20Interpreting%20Relationships.pdf)
 Why does Hans Rosling say that students at the Karolinska Institute know statistically significantly less about the world than do chimpanzees  and professors at the Karolinska Institute are roughly on a par with chimpanzees?
 A study showed (this is true) that students who view the recorded videos of the lectures many times perform less well on the final exam than students who view the videos fewer times. Upon discovering this, your professor announces that he/she will discontinue recording the lectures because, the professor says, the videos have been shown to cause students to do perform more poorly on the course. Explain your point of view to the professor  in clear and simple language even a professor might be able to understand.
 Explain why the number of lectures attended could be a potential confounding factor in considering the relationship between the frequency of viewing class videos and performance on the course.
 Can you think of potential mediating factors?
 In the 1964 U.S. Public Health Service study it was found that, for men and for women in each age group, current smokers were on average much healthier than the former people who had quit smoking.
 Suppose that an important factor that explains this surprising fact is that people who quit smoking tend to experience increased stress and weight gain as a result of quitting, and that these factors adversely affect health. Are increased stress and weight gain potential mediating factors or confounding factors? Explain briefly.
 Suppose that an important factor that explains this fact is that an important proportion of people who quit smoking do so because they are in poor health and have been strongly advised to quit smoking by their physicians. Would this be a mediating factor or a confounding factor in considering the relationship between between quitting smoking and health? Explain briefly.
 In this study, are age and sex potential confounding factors or mediating factors?
 A study shows that heavy users of sunscreen lotion have a higher chance of developing skin cancer.
 Does this imply that you should avoid using sunscreen lotion in order to reduce your chances of developing skin cancer?
 Can you think of potential confounding factors and potential mediating factors?
Assignment 2
Assigned Jan. 21. Due Feb 4.
If we do this well, we will end up writing our own solution manual for the textbook. Each column of the table shows the problems assigned to the teams whose names appear at the top of the column. Each team does all the problems in their column. Like this we will have 3 or 4 versions of each question.
 First, develop and discuss solutions within your 'private' team forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=404634) or using some other method to collaborate.
 When the solution to a problem is ready to be posted, post it to the appropriate 'chapter forum': Exercises  Introduction (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=418963), Exercises  Chapter 1 (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=418966), etc. using the following format for the title of the post:
 "Chapter N" or "Introduction"
 "Problem" or "Reading Question" N
 Short title for the question
 Some examples:
 Chapter 1 Problem 1.02: Syntax Errors
 Introduction Question 1: Smoking  Starting and Quitting
 Chapter 3 Reading Question 2: 100% Confidence Interval
 Some examples:
Bayes Birnbaum  Box Cai  Hill GrayIhaka  Mahalanobis Fisher 

Introductory Exercises (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=416707)  
Question 1  Question 2  Question 3  Question 4 
Question 5  Question 6  Question 7  Question 8 
Chapter 1 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter1Problems/SM2Chapter1ProblemsA.html?access=ISMf12)  
Reading Question 1  Reading Question 2  Reading Question 3  Reading Question 4 
Reading Question 5  Reading Question 6  Reading Question 7  Prob. 1.01 
Prob. 1.02  Prob. 1.04  Prob. 1.05  Prob. 1.10 
Prob. 1.11  Prob. 1.12  
Chapter 2 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter2Problems/SM2Chapter2ProblemsA.html?access=ISMf12)  
Reading Question 1  Reading Question 2  
Reading Question 3  Reading Question 4  Reading Question 5  Prob. 2.02 
Prob. 2.04  Prob. 2.09  Prob. 2.14  Prob. 2.22 
Chapter 3 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter3Problems/SM2Chapter3ProblemsA.html?access=ISMf12)  
Reading Question 1  Reading Question 2  Reading Question 3  Reading Question 4 
Reading Question 5  Reading Question 6  Reading Question 7  Prob. 3.01 
Prob. 3.02  Prob. 3.03  Prob. 3.04  Prob. 3.05 
Prob. 3.06  Prob. 3.08  Prob. 3.09  Prob. 3.10a 
Prob. 3.10b  Prob. 3.11  Prob. 3.12  Prob. 3.13 
Prob. 3.14  Prob. 3.15  Prob. 3.16  Prob. 3.17 
Prob. 3.18  Prob. 3.19  Prob. 3.20  Prob. 3.23 
Prob. 3.24  Prob. 3.25  Prob. 3.28  Prob. 3.29 
Prob. 3.30  Prob. 3.31  Prob. 3.36  Prob. 3.50 
Prob. 3.53  Prob. 3.54 
Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:
 0 stars: not yet graded
 1 star: you gave it a try
 2 stars: better try
 3 stars: good overall but errors in the details
 4 stars: very good  essentially correct
 5 stars: excellent  goes beyond expectation
Assignment 3
Assigned Feb. 13. Due March 4 18
Bayes Birnbaum  Box Cai  Hill GrayIhaka  Mahalanobis Fisher 

Chapter 4 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter4Problems/SM2Chapter4ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 3  Prob. 4.03 
Prob. 4.04  Prob. 4.05  Prob. 4.06  Prob. 4.07 
Prob. 4.08  
Chapter 5 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter5Problems/SM2Chapter5ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1 (1st bullet)  Reading Question 2 (2nd bullet)  Reading Question 3 (3rd bullet)  
Prob. 5.01  Prob. 5.02  Prob. 5.03  Prob. 5.09 
Prob. 5.12  Prob. 5.13  Prob. 5.17  Prob. 5.20 
Prob. 5.23  Prob. 5.30  Prob. 5.31  Prob. 5.40 
Chapter 6 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter6Problems/SM2Chapter6ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1 (1st bullet)  Reading Question 2 etc.  Reading Question 3  Reading Question 4 
Reading Question 5  Reading Question 6  Prob. 6.01  Prob. 6.04 
Prob. 6.05  Prob. 6.10  Prob. 6.11  Prob. 6.12 
Prob. 6.13  Prob. 6.20  Prob. 6.21 
Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:
 0 stars: not yet graded
 1 star: you gave it a try
 2 stars: better try
 3 stars: good overall but errors in the details
 4 stars: very good  essentially correct
 5 stars: excellent  goes beyond expectation
Assignment 4
Due: April 4
Bayes Birnbaum  Box Cai  Hill GrayIhaka  Mahalanobis Fisher 

Chapter 7 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter7Problems/SM2Chapter7ProblemsA.html?access=ISMf12) Answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 3  Reading Question 4 
Reading Question 5  Prob. 7.01  Prob. 7.02  Prob. 7.04 
Prob. 7.04  Prob. 7.04  Prob. 7.04  Prob. 7.05 
Prob. 7.10  Prob. 7.11  Prob. 7.11  Prob. 7.11 
Prob. 7.11  Prob. 7.12  Prob. 7.13  Prob. 7.14 note 
Prob. 7.15  Prob. 7.20  
Chapter 8 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter8Problems/SM2Chapter8ProblemsA.html?access=ISMf12) Answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 3  Reading Question 4 
Prob. 8.01  Prob. 8.02  Prob. 8.01  Prob. 8.02 
Prob. 8.05  Prob. 8.05  Prob. 8.05  Prob. 8.05 
Chapter 9 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter9Problems/SM2Chapter9ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 3  Reading Question 1 
Prob. 9.01  Prob. 9.02  Prob. 9.01  Prob. 9.02 
Prob. 9.04  Prob. 9.04  Prob. 9.04  Prob. 9.04 
Prob. 9.10  Prob. 9.10  Prob. 9.21  Prob. 9.21 
Chapter 10 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter10Problems/SM2Chapter10ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 5  Prob. 10.05 
Chapter 12 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter12Problems/SM2Chapter12ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1  Reading Question 2  Reading Question 5  Reading Question 6 
Prob. 12.01  Prob. 12.02  Prob. 12.01  Prob. 12.02 
Chapter 13 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2Chapter13Problems/SM2Chapter13ProblemsA.html?access=ISMf12) As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.  
Reading Question 1 & 2  Reading Question 3  Reading Question 4  Reading Question 5 
Prob. 13.01  Prob. 13.01  Prob. 13.01  Prob. 13.01 
Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:
 0 stars: not yet graded
 1 star: you gave it a try
 2 stars: better try
 3 stars: good overall but errors in the details
 4 stars: very good  essentially correct
 5 stars: excellent  goes beyond expectation
Assignment 5 (cancelled)
Some links
 The story of Sally Clark, wrongfully convicted by a bad pvalue (http://en.wikipedia.org/wiki/Sally_Clark)
 How some Ontario nurses came close to suffering the same fate (http://en.wikipedia.org/wiki/Toronto_Hospital_Murders)
 A relevant XKCD cartoon (http://xkcd.com/1132/)