# MATH 2565 W 2014

(Redirected from MATH2565)

MATH 2565 Introduction to Applied Statistics 2014 W

### Breaking News

#### Classroom (SLH B) available on Tuesdays 11:30 am to 12:30

I have reserved our classroom for the hour following the Tuesday lecture. This will make easier for groups to meet if they wish and I can stay in class when I'm otherwise free to answer questions.

#### Tutors who can help with MATH 2565

There is a 'StatsLab' in South 525 Ross open Monday to Friday 10:30am to 3:30pm

The content of our course is unusual for a statistics course and some tutors might have difficulty with your questions. The following tutors are familiar with MATH 2565 and you should consider speaking to them if other tutors can't help you:

• Nanwei Wang: Monday 1:30 - 2:30, Wednesday 12:30 - 3:30
• Dongwei Wei: Wednesday 10:30 - 1:30
• Hangjing Wang: Thursday 12:30 - 3:30

### Textbook

The textbook for this section is different from the one used for other recent sections of MATH 2565. We will be using

Daniel Kaplan (2011) Statistical Modeling: A Fresh Approach, 2nd ed. (http://www.mosaic-web.org/go/StatisticalModeling/) available at the bookstore and through Amazon, along with exercises published as a pdf file available online: Daniel Kaplan (2012) Exercises for Statistical Modeling: A Fresh Approach (userid: stats, password: stats) (http://blackwell.math.yorku.ca/Files/MATH2565/Kaplan/Kaplan%20Exercise-Collection.pdf)

### Moodle

• MOODLE SITE for MATH 2565 (https://moodle.yorku.ca/moodle/course/view.php?id=16142)
• We will use Moodle for four purposes:
• A forum in which you can post questions, comments and answers related to the course and to Statistics in general.
• A forum entitled Statistics in the News in which you post interesting things you have found and comment on the posts of others.
• A way for you to submit assignments.

### Calendar

If you are prompted for a userid and password to access a file, use 'stats' for both.

Week Date
(2014)
1 Jan 7
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_07.swf)

Topic 1: Introduction to Statistical Ideas
Introduction to Statistical Ideas - Slides - originals (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=410516)

Jan 9
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_09.swf)
Install R and RStudio (http://scs.math.yorku.ca/index.php/R:_Getting_started_with_R)
Assignment 0 due Jan 10
2 Jan 14
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_14.html)
in preparation for Assignment 1
and then resume where we left off last Thursday.
Jan 16
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_16.swf)
3 Jan 21
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_21.swf)
Preparation for this class:
• Discuss: Exercises (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=416707)
• Read Chapter 1 of the textbook.
• Install the mosaic package (https://moodle.yorku.ca/moodle/mod/forum/discuss.php?d=197047) and work through the R examples in Chapter 1

Special topics covered:

• Agresti diagrams
Jan 23
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_23.swf)
Preparation for this class:
• Read Chapters 2 and 3 of the textbook.

Files:

4 Jan 28
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_28.swf)
Jan 30
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_01_30.swf)
• Chapter 3 Slides -- annotated (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=425130):
• finding the standard deviation by hand
• the Central Limit Theorem (CLT) and probabilities for a normal distribution
• Two-way contingency tables: frequencies, cell percentages, row percentages, column percentages
5 Feb 4
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_04.swf)
Assignment 2 due Feb 4
Feb 6
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_06.swf)
Topics:
• Five-number summary and the simple boxplot
• Confounding vs mediating variables
• Models:
• 'Explaining' variance: overall variance = model variance + residual variance
• Probabilities with z-scores
• Some possible exam questions:
• Write a letter to the editor in response to an article that claims that students who work more hours get higher grades so we should cut OSAP to encourage students to work more hours.
• Use an Agresti diagram to explain how conditional association between two variables can have a different sign than their unconditional association
• Draw a boxplot
• Work out a probability using a z-score from an approximately normal distribution: You got a z-score of 1 in the midterm, approximately how many students got higher grades?
• More sample midterm questions
6 Feb 11
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_11.swf)
• Questions from Chapter 4 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-4-Problems/SM2-Chapter-4-Problems-A.html?access=ISMf12)
• NATS 1500 midterm test from last year (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=429909) with annotations on Feb. 11 and additional notes on 2-way tables, etc.
Feb 13 Mid-term test
In SLH B: A to J, and in CLH K: K to Z
7 Feb 25
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_02_25.swf)
Feb 27 Cancelled
8 Mar 4
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_04.swf)
Assignment 3 due Mar 4 18
Mar 6
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_06.swf)
9 Mar 11
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_11.swf)
• Class notes (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=440485)
Mar 13
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_13.swf)
10 Mar 18
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_18.swf)
Assignment 3 due Mar 18
Mar 20
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_20.swf)
Visualizing Correlation with notes (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=443274)
11 Mar 25
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_25.swf)
Summary of formulas on regression
Mar 27
video (http://blackwell.math.yorku.ca/Files/MATH_2565_2014_03_27.swf)
12 April 1
Apr 3 Inadvertently, this class was not recorded. The following link has the annotations shown on the screen.
Final
exam
Apr 8-
Apr 24
Exact date and time should be announced by the Registrar before early March

## General Information

How do you know what you know? Why do you feel very confident that some things are true but you feel less sure about others? Do you feel very sure about some things that, perhaps, you shouldn’t be so sure about? And unsure about things that you should, in fact, be confident of.

Statistical reasoning is crucial for a critical understanding of the flood of data and information we face daily in modern society. Understanding the principles of statistical reasoning and being aware of a number of widespread errors in statistical thinking is often the key for distinguishing arguments that are sound from those that are fallacious.

This course stresses the logic and reasoning behind statistics. We avoid complex mathematical formulas. Statistical reasoning is applied to a critical analysis of current events reported in the media and current scientific, medical and social controversies.

By the end of course, you will have developed an understanding of the reasons why scientific evidence can appear to lead to contradictory conclusions. You will have a better understanding of the assumptions that lead to these different conclusions and you will be in a better position of have informed judgments on the quality of scientific claims.

### Instructor

• Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)
• North 626 Ross
• mailto:georges+math2565@yorku.ca (Note: the "+math2565" portion is designed to help avoid spam filters)
• Phone: (416) 736-2100 ext. 77164
• http://georges.blog.yorku.ca
• Office hours: Fridays 8:30 am to 10:30 am during the academic term: January to April, 2014, other times by appointment.

### Assignments and Tests

Dates for MATH 2565: Assignments are due at 11:55pm on the date shown unless otherwise indicated
Assignment 0 (individual) Jan. 7 Friday Jan. 10 noon Assignment 0 1%
Assignment 1 (team) Jan. 9 Jan. 2124 Assignment 1 6 7%
Assignment 2 (team) Jan. 23 Feb. 4 Assignment 2 6 7%
Mid-term test
Feb. 13 In SLH B: A-J and in CLH K: K-Z 30%
Assignment 3 (team) Feb. 13 March 4 18 Assignment 3 6 8%
Assignment 4 (team) ---- April 4 Assignment 4 6 8%
Assignment 5 (team) cancelled ----- ----- 6 0%
Final exam April 8 - 24 35%
Participation Jan. 7 - April 7 Class participation and contributions to Moodle open forums 4%
Important academic dates from the York website (http://www.registrar.yorku.ca/enrol/dates/fw13.htm)

### Textbook

Daniel Kaplan (2011) Statistical Modeling: A Fresh Approach, 2nd. ed. (available at the York bookstore)
Daniel Kaplan (2012) Exercises for Statistical Modeling: A Fresh Approach (available online: userid: stats, password: stats) (http://blackwell.math.yorku.ca/Files/MATH2565/Kaplan/Kaplan%20Exercise-Collection.pdf)

This is a textbook that takes a radically different approach to introductory statistics. Most textbooks build the subject, mathematically, from the ground up. This book focuses on the larger concepts. It avoids formulas but it stresses using a free software package, R, to give you the experience of actually using statistics to address real questions.

In contrast with most textbooks, this book is not slick and has few graphs and pictures. However, it is much less expensive. Don't worry about the lack of pictures -- they really drive up the cost of a textbook -- you will get to produce them yourself using R, a leading statistical language for data analysis and graphics.

### Lectures and Tutorials

• Class: Tuesdays and Thursdays, 10 am to 11:30 am in Steadman Lecture Halls (SLH) B. The first class takes place on Tuesday, January 7, 2014.
• Optional tutorials: Occasional tutorials will be held as needed at times to be determined. The purpose of the tutorials will be to help you with problems using computers and to discuss questions regarding the material of the course. The days on which tutorials will be held will be announced in advanced.

### Course Policies

Late assignments
Late assignments or projects are penalized 10% of the value of the assignment for each day (or portion of a day) it is late. Unless a time is specified, assignments and projects are due before 11:55 pm.
Missed term test
If you miss the term test with a suitably documented medical or compassionate reason, your mark for the term test will be imputed from your mark on the final exam. Otherwise you receive a grade of zero for the mid term.
Use of computers in class
You are encouraged to bring your laptop to class to use it for purposes directly related to the class such as taking notes, annotating slides posted on the web or trying out commands in R. Some students think that it does not affect anyone else if you are doing your own thing in class on your laptop or other electronic device. This is wrong. People seated around you cannot help but be distracted. Therefore, you may not use your laptop to view unrelated materials such as videos because this creates a visual distraction for students seated near you. Failure to observe this policy may result in warnings, may have an impact on your class participation mark and may result in being asked to leave the classroom.
Demeanor in class
It is okay to turn momentarily to your neighbour in class to quietly ask a brief clarifying question related to the material in the class and it is okay to give a quiet and brief answer. You may not, however, have any conversation beyond a very brief and quiet exchange. The instructor may be so absorbed in what he is saying that he won't notice you. Other students, who may be struggling to remain absorbed, do notice and are very distracted by conversations. They will be annoyed and many will come and complain to me for my failure to enforce adequate discipline. Please don't put me in this awkward position.
Familiarize yourself with the York University Senate Policy on Academic Honesty (http://www.yorku.ca/univsec/policies/document.php?document=69). Violations of academic honesty are treated very seriously in university.

### Resources

• Datasets and lecture notes will be posted at http://blackwell.math.yorku.ca/Files/MATH2565. Since some of the material may be copyrighted, access to the files is protected and requires a userid: 'stats' as well as a password: 'stats' also.
• When you find interesting links on the web you should post them on the appropriate Moodle forum.

### Team Assignments

Assignments are done by semi-randomly assigned teams. Why random teams? One reason is that in almost all job interviews, you are asked about your experience working with teams. Working with a diverse team that you didn't select yourself gives you the opportunity to have experiences that will give you great anecdotes to use in your future job interviews. When you land the job, you will be much more likely to show the kind of leadership in team work that is invaluable in the modern workplace.

• I will email the list of members in your team some time during the weekend of January 11. The members of your team can communicate by email, meet in person, and use the special Team Forum on the wiki which is visible only to members of your team and to the instructor and TAs but typically only on invitation from a member of the team.
• All team assignments are due at 11:55pm on Tuesdays. Meet with your team after the class on Tuesday to finalize your submission so you only need to do some proofreading and merging before the deadline at midnight.
• Include the names of all active participants on the first page of the assignment. Everyone who participated actively gets the same grade. Those who didn't, get zero. Note that some team members might not respond because they have dropped -- or intend to drop -- the course. If your team shrinks to 3 or fewer, let me know and I can merge your team with another smaller team.
• The more work you do on an assignment the better prepared you are to do well on the mid-term and on the final. But you shouldn't hog the work -- let others do their part too. Everyone should make sure that they understand the whole assignment. Discuss the assignment with your team members to make sure everyone understands the key points and difficulties of each question.

### Class representatives

In approximately 2 weeks, we will select 3 class representatives. The class representatives can:

• help organize group study sessions,
• meet occasionally with the instructor to provide feedback on the course.

### Using computers for the course

Using the software package R is a critical component of the course. The best place to use R is on your own laptop or desktop. The skills you learn on your own computer you will be able to continue using after the course. If you don't have access to a computer you can use a computer in the Gauss Lab.

Some assignments and the individual project will require you to use computing software to view and analyze data. The test and exam will require you to interpret output from the same software. You can learn the computing aspects of the course in a number of ways:

• If you have access to a computer, you can download the software for the course. We use public domain software that runs on Windows, MacOS X or Linux. If you have a laptop, you are encouraged to bring it to class and to tutorials and office hours.
• If you don't have access to a computer, you can get an account to use computers in the Gauss Lab where the software will be available. You also need to get a card to get access to the Gauss Lab.
• The course will show detailed examples of simple statistical analyses using R.

## Topic 1: Introduction to Statistical Ideas

### Material covered

• Introduction to the course
• How do we know what we know: Links to slides and exercises will be posted soon

#### What is 'Statistics'?

Many textbooks give a definition like the following:
Definition: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.
This is certainly an important aspect of statistics but I think it only tells a small part of the story. Statistics is the science (and art) of working with uncertainty --- whether you plan to make decisions or not. We tend to think of statements as true or false. But in practice, the truth or falsity of most important statements is not known with certainty. There are all shades of degrees of uncertainty between between being certain a statement is true or false. Many of the most important decisions and choices we make in life are made despite the fact that we don't have all the information we would like to have to determine which route is best. Sometime we simply act as if something is true or false although we don't really know. Statistics is not just about how to make these difficult decisions. It is also about remembering and being aware of our uncertainty so we know where to look for better information and how to revise our hypotheses as relevant information becomes available. Statistics is not just about making decisions, it's about where to look for information that could lead us to change our decisions. It's about knowing when to keep an open mind and knowing when and how to change your mind.
Statistics is about the fascinating journey from ignorance to increasingly certain knowledge to wisdom. This is a journey we all follow individually. It is also a journey undertaken by disciplines, by political and social organisms and by mankind as a whole.
##### Possible test question

A traditional definition of statistics is that it is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty. Give a brief critique of this definition (100 to 300 words).

#### Experimental vs Observational data sets

If X and Y are correlated, what can it mean?
1) X causes Y?
2) Y causes X?
3) Another variable(s) Z(s) causes both X and Y?
a) Some Zs might be known and measurable. For these Zs we might be able to adjust using sophisticated statistical methods.
b) Some Zs might be known but hard or impossible to measure. This is more difficult to deal with.
c) Some Zs might not be discovered until the year 3000. We can't adjust statistically for these.
4) Selection: maybe there's no relationship but some data got thrown out or ignored and the data left created the impression of a relationship.
5) Chance: This is the one statisticians are really good at dealing with -- as you will learn in this course.
What if we have an 'experiment' with 'random allocation of X' to experimental units?
1) X causes Y? possible
2) Y causes X? No! We know what caused X. It was the coin toss or the random number generator that caused X.
3) Another variable(s) Z(s) causes both X and Y? Maybe. But it could only be by chance that differences in levels of any combination of Zs, known or not, measurable or not, would have a large impact on Y.
4) Selection? We can exclude this by checking how the data were obtained.
5) Chance again.
So, if we can exclude selection, we are left with two options:
1) X causes Y, or
2) Chance.
We can use statistical analysis to measure chance. If the chance is very small then we may be left with X causes Y as the plausible explanation.
How should you react to causal claims based on data analyses?
1) The key question you should ask: Is the data set experimental or observational? You might have to ask questions to answer this. Generally, it isn't obvious from the appearance of the data. The critical issue is how were the levels of X assigned to the units: strictly randomly or by choice or judgement of the subjects or of the experimenters?
2) If experimental: double check to make sure allocation was really random and not by judgment or done haphazardly? Was the study double-blind? Are there possible biases in measurements? Psychological factors that influence outcome? Does the claim match the nature of the experiment or is the claim stretching to something that does not correspond exactly to what was done in the experiment?
3) If observational:
a) Can you poke an obvious hole in the claim? E.g. is there a plausible alternative explanation that was not taken into account in the analysis? In this case, you've countered the claim.
b) What has the analysis adjusted for? Are these factors that can be measured with precision? What kinds of factors are not accounted for?
c) Has the analysis over-corrected by controlling for possible mediating factors that should not be controlled?
d) If a causal connection seems paradoxical, can you think of plausible mediating factors that might explain causality?
With observational data, you can't be 100% sure that the relationship is causal but you can check whether important alternative possibilities have been adequately addressed.
Some examples in the news: Toronto Star: Pulse (http://www.math.yorku.ca/people/georges/Files/NATS1500/Week02/StatisticsInTheNews030926.html)
Which examples are experimental and which are observational?
Which conclusions are reasonable and which are not? Why?

### Things to do 1

'Things to do' are tasks that are not graded but are important to keep up with the course

### Assignment 0

Due: 12 noon, January 10, 2014

I would like to know something about you and I also want to form random teams of 4 or 5 students to work on the assignments. I will use your emailed responses to this Assignment 0 to form the teams. You will receive the names of your team members on January 13 so you can meet face to face at the break during the class on January 14.
To complete Assignment 0, fill out this survey (https://docs.google.com/forms/d/1GILvwy3V4KWb8L7_CnyH-bHJp9k6XssCID4SrBTNlB8/viewform). It should take approximately 5 minutes.

## Summary of introductory material

### Synopsis

#### Types of data

• Purposes for analyzing data:
• Descriptive
• Inference: causal
• Inference: predictive
• Types of data
• Experimental: X under control of experimenter: random assignment of levels of X
• Observational: X determined by other factors and just observed, not manipulated
• How purposes and types of data match
• Descriptive statistics can be done with any kind of data since there is no intention to generalize
• Causal inference is best done with experimental data
• Caution: experiments are often conducted with volunteers who may not be similar to the target population for causal inference. Often, the only true experiments may be on animals who may or may not mimic the corresponding processes in humans.
• Predictive inference is best done with observational data sampled so it is representative of the target population.
• Just as random allocation is crucial for experiments, random selection is ideal for observational data for predictive inference.
• Causal inference with observational data is highly problematic
• Often, important questions are causal in nature and all that's available is observational data.
• We can never be certain of causal conclusions based on observational data
• Intelligent evaluation of causal claims based on observational data is challenging but may be the only way to shed light on crucial questions.
• Assessing causal claims from observational data, where the relationship between X and Y is too strong to be attributed to chance:
• Look for plausible alternative explanations:
• Perhaps Y can cause X?
• Are there obvious plausible confounding factors: factors that could cause both X and Y. Note that factors that are caused by X and, in turn, cause Y are mediating factors that explain and do not contradict the possibility that X causes Y.
• Have some of these possible confounding factors been controlled for in the study? How effectively?
• Do important factors remain that have not been controlled for?
• Consider the possibility of a selection effect.
• Consider possible mediating factors that could explain how X could cause Y, even when the suggestion that X causes Y seems surprising.
• When there are different sources of data, consider which seem more reliable and why?
• What kind of data could determine whether X causes Y? Why does it not yet exist? Is it likely to be available in the future? What obstacles exist to obtaining such data?
• Can you come to a practical conclusion and how much confidence do you have in it?
• Good experiments:
• Control vs treatment groups: experiments involve a comparison between two or more conditions or treatments)
• Placebos -- blinding of subject
• Blinding of assessor
• If both subject and assessor are blind, we have double blind
• Randomization is crucial so we can be sure that all possible confounding factors known or unknown are not responsible for the outcome except possibly by chance'. Randomization can be applied in many ways:
• completely randomized design: take all subjects and randomly allocate to each treatment
• paired designs: for two treatments: split subjects into pairs that are similar with respect to relevant variables, then randomly select within each pair.
• blocked designs: for more than two: split subjects into similar blocks with as many subjects as treatments, then randomly assign within each block.
• longitudinal designs: give all or some of the treatments to each subject. Randomize order.
• Special types of observational studies for causal inference:
• Retrospective: (measure Y in the present or past and X in the past)
• Prospective: measure X now, Y later.
• Case-control: If Y is disease vs. no disease: choose a group of subjects with with the disease (the cases) and then, for each case, find a non-diseased subject who is similar with respect to selected Zs. Measure X on everyone and see if X is related to Y.
• Longitudinal without randomization: Subjects get all levels of X either in same order or in an order not controlled by experimenter.

Some important ideas not in the text are:

1. the explicit list of 5 possible reasons for an association between X and Y in observational data
2. the connection between these reasons and the possible reasons with experimental data
3. the distinction between confounding factors and mediating factors
4. using an Agresti diagram to make the connection between conditional and unconditional (marginal) association between two variables

These concepts make explicit and clarify the reasoning underlying many statements in the text.

## Topics 10, 11 & 12: Visualizing Regression

Important ideas going beyond the textbook:

1. Regression to the mean, the regression paradox and the regression fallacy
2. Kahneman's insight on the consequences of regression to the mean and different perceptions of the effects of praise and criticism
3. Visualizing regression and correlation

### Summary of a few formulas

#### Estimating a proportion

Let p be a proportion in a sample of n from a population where the true proportion is π.

Expected value: E(p) = π
Variance: $appx Var (p) = \frac{1}{4n}$ -- this is valid if π is not too close to 0 or 1.
Standard Error: $appxSE(p) = \frac{1}{2 \sqrt{n}}$
Margin of Error (95% CI): $appxME(p) = 2 \times appxSE(p) = \frac{1}{\sqrt{n}}$

#### Estimating a mean

Let μ be the mean of a population with standard deviation σ
Then \bar{Y}, the mean of a sample of n:

Expected value: $E(\bar{y}) = \mu$ the sample mean is an unbiased estimator of the population mean
Standard Error: $SE(\bar{y}) = \frac{s_Y}{\sqrt{n}}$
Margin of Error: $ME = 2 \times SE = 2 \times \frac{s_Y}{\sqrt{n}}$ for an approximate 95% CI

#### Estimating a slope in linear regression

Let the true linear relationship between X and Y in a population be

Y = β0 + β1 + ε

where ε is random with mean 0 and a given variance. Let

$Y = \hat{Y} + r = b_0 + b_1 X + r$
$\hat{Y} = b_0 + b_1 X$

be the fitted least-squares equation Then

Expected Value: E(b1) = β1 i.e. b1 estimates the true slope in the population
Standard Error: $SE(b_1) = \frac{s_r}{\sqrt{n}} \times \frac{1}{s_X}$
Margin of Error: $ME = 2 \times SE$ for an approximate 95% confidence interval

#### Estimating a correlation

Review the visual way of estimating a correlation using a data ellipse.

Let r be estimated correlation. Then:

Expected Value: E(r) = ρ -- the sample correlation estimates the true correlation
Approximate SE: $SE(r) = \frac{1}{\sqrt{n}}$
Approximate ME: $ME = \frac{2}{\sqrt{n}}$
$r^2 = \frac{Var(\hat{y})}{Var(Y)}$

Note: r is the slope of the regression of the z-score of Y on the z-score of X.

In multiple regression:

R = correlation between Y and $\hat{Y}$
R2 = $\frac{Var(\hat{y})}{Var(Y)}$

### Team Assignment 1

Due: January 21 24, 2014
• Explore Gapminder World (http://www.gapminder.org/world/)
• Learn how to select different variables for the Y axis, the X axis, the size and the colour of points.
• Learn how to select different subsets of countries for highlighting and how to turn 'trails' on or off.
• Learn how to control the time animation and its speed.
• Find a selection of variables that seem to tell an interesting story about a trend or a historical event that you find interesting.
• Do a bit of research on this trend or event.
• Copy the URL for the animation you selected by clicking on the 'Share graph' button and copying the URL that is shown.
• Write an interesting short essay (300 to 2,000 words) describing what the animation shows.
• Post the URL and the essay on the Moodle forum for Assignment 1 (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=398658).
• Note that each team prepares only one essay and URL. Of course, if you want to post other essays as posts to the blog, that is more than welcome!
• You should include links to online materials you used, e.g. Wikipedia articles are considered acceptable for this assignment, and you should include references to other materials that you used but there is no need for an exhaustive list of references as would be required for a more formal scholarly essay.

### Possible test/exam question

Think about these questions for discussion in class

1. Questions on interpreting relationships (http://blackwell.math.yorku.ca/Files/MATH%202565%20Exercises%20Interpreting%20Relationships.pdf)
2. Why does Hans Rosling say that students at the Karolinska Institute know statistically significantly less about the world than do chimpanzees -- and professors at the Karolinska Institute are roughly on a par with chimpanzees?
3. A study showed (this is true) that students who view the recorded videos of the lectures many times perform less well on the final exam than students who view the videos fewer times. Upon discovering this, your professor announces that he/she will discontinue recording the lectures because, the professor says, the videos have been shown to cause students to do perform more poorly on the course. Explain your point of view to the professor -- in clear and simple language even a professor might be able to understand.
• Explain why the number of lectures attended could be a potential confounding factor in considering the relationship between the frequency of viewing class videos and performance on the course.
• Can you think of potential mediating factors?
4. In the 1964 U.S. Public Health Service study it was found that, for men and for women in each age group, current smokers were on average much healthier than the former people who had quit smoking.
• Suppose that an important factor that explains this surprising fact is that people who quit smoking tend to experience increased stress and weight gain as a result of quitting, and that these factors adversely affect health. Are increased stress and weight gain potential mediating factors or confounding factors? Explain briefly.
• Suppose that an important factor that explains this fact is that an important proportion of people who quit smoking do so because they are in poor health and have been strongly advised to quit smoking by their physicians. Would this be a mediating factor or a confounding factor in considering the relationship between between quitting smoking and health? Explain briefly.
• In this study, are age and sex potential confounding factors or mediating factors?
5. A study shows that heavy users of sunscreen lotion have a higher chance of developing skin cancer.
• Does this imply that you should avoid using sunscreen lotion in order to reduce your chances of developing skin cancer?
• Can you think of potential confounding factors and potential mediating factors?

## Assignment 2

Assigned Jan. 21. Due Feb 4.

If we do this well, we will end up writing our own solution manual for the textbook. Each column of the table shows the problems assigned to the teams whose names appear at the top of the column. Each team does all the problems in their column. Like this we will have 3 or 4 versions of each question.

1. First, develop and discuss solutions within your 'private' team forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=404634) or using some other method to collaborate.
2. When the solution to a problem is ready to be posted, post it to the appropriate 'chapter forum': Exercises -- Introduction (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=418963), Exercises -- Chapter 1 (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=418966), etc. using the following format for the title of the post:
1. "Chapter N" or "Introduction"
2. "Problem" or "Reading Question" N
3. Short title for the question
Some examples:
Chapter 1 Problem 1.02: Syntax Errors
Introduction Question 1: Smoking -- Starting and Quitting
Chapter 3 Reading Question 2: 100% Confidence Interval

Assignment 2
Bayes

Birnbaum
Blackwell
Bonferroni

Box

Cai
Galton
Gauss

Hill

Gray-Ihaka
Laplace
Wu

Mahalanobis

Fisher
Neyman
Reid

Introductory Exercises (https://moodle.yorku.ca/moodle/mod/resource/view.php?id=416707)
Question 1 Question 2 Question 3 Question 4
Question 5 Question 6 Question 7 Question 8
Chapter 1 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-1-Problems/SM2-Chapter-1-Problems-A.html?access=ISMf12)
Prob. 1.02 Prob. 1.04 Prob. 1.05 Prob. 1.10
Prob. 1.11 Prob. 1.12
Chapter 2 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-2-Problems/SM2-Chapter-2-Problems-A.html?access=ISMf12)
Prob. 2.04 Prob. 2.09 Prob. 2.14 Prob. 2.22
Chapter 3 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-3-Problems/SM2-Chapter-3-Problems-A.html?access=ISMf12)
Prob. 3.02 Prob. 3.03 Prob. 3.04 Prob. 3.05
Prob. 3.06 Prob. 3.08 Prob. 3.09 Prob. 3.10a
Prob. 3.10b Prob. 3.11 Prob. 3.12 Prob. 3.13
Prob. 3.14 Prob. 3.15 Prob. 3.16 Prob. 3.17
Prob. 3.18 Prob. 3.19 Prob. 3.20 Prob. 3.23
Prob. 3.24 Prob. 3.25 Prob. 3.28 Prob. 3.29
Prob. 3.30 Prob. 3.31 Prob. 3.36 Prob. 3.50
Prob. 3.53 Prob. 3.54

Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:

1 star: you gave it a try
2 stars: better try
3 stars: good overall but errors in the details
4 stars: very good -- essentially correct
5 stars: excellent -- goes beyond expectation

## Assignment 3

Assigned Feb. 13. Due March 4 18

Assignment 3
Bayes

Birnbaum
Blackwell
Bonferroni

Box

Cai
Galton
Gauss

Hill

Gray-Ihaka
Laplace
Wu

Mahalanobis

Fisher
Neyman
Reid

Chapter 4 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-4-Problems/SM2-Chapter-4-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Prob. 4.04 Prob. 4.05 Prob. 4.06 Prob. 4.07
Prob. 4.08
Chapter 5 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-5-Problems/SM2-Chapter-5-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
(1st bullet)
(2nd bullet)
(3rd bullet)
Prob. 5.01 Prob. 5.02 Prob. 5.03 Prob. 5.09
Prob. 5.12 Prob. 5.13 Prob. 5.17 Prob. 5.20
Prob. 5.23 Prob. 5.30 Prob. 5.31 Prob. 5.40
Chapter 6 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-6-Problems/SM2-Chapter-6-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
(1st bullet)
etc.
Prob. 6.05 Prob. 6.10 Prob. 6.11 Prob. 6.12
Prob. 6.13 Prob. 6.20 Prob. 6.21

Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:

1 star: you gave it a try
2 stars: better try
3 stars: good overall but errors in the details
4 stars: very good -- essentially correct
5 stars: excellent -- goes beyond expectation

## Assignment 4

Due: April 4

Assignment 4
Bayes

Birnbaum
Blackwell
Bonferroni

Box

Cai
Galton
Gauss

Hill

Gray-Ihaka
Laplace
Wu

Mahalanobis

Fisher
Neyman
Reid

Chapter 7 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-7-Problems/SM2-Chapter-7-Problems-A.html?access=ISMf12)
Answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Reading Question 5 Prob. 7.01 Prob. 7.02 Prob. 7.04
Prob. 7.04 Prob. 7.04 Prob. 7.04 Prob. 7.05
Prob. 7.10 Prob. 7.11 Prob. 7.11 Prob. 7.11
Prob. 7.11 Prob. 7.12 Prob. 7.13 Prob. 7.14
note
Prob. 7.15 Prob. 7.20
Chapter 8 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-8-Problems/SM2-Chapter-8-Problems-A.html?access=ISMf12)
Answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Prob. 8.01 Prob. 8.02 Prob. 8.01 Prob. 8.02
Prob. 8.05 Prob. 8.05 Prob. 8.05 Prob. 8.05
Chapter 9 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-9-Problems/SM2-Chapter-9-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Prob. 9.01 Prob. 9.02 Prob. 9.01 Prob. 9.02
Prob. 9.04 Prob. 9.04 Prob. 9.04 Prob. 9.04
Prob. 9.10 Prob. 9.10 Prob. 9.21 Prob. 9.21
Chapter 10 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-10-Problems/SM2-Chapter-10-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Chapter 12 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-12-Problems/SM2-Chapter-12-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Prob. 12.01 Prob. 12.02 Prob. 12.01 Prob. 12.02
Chapter 13 (https://dl.dropboxusercontent.com/u/5098197/Exercises/HTML/SM2-Chapter-13-Problems/SM2-Chapter-13-Problems-A.html?access=ISMf12)
As much as possible, answer each question in a separate post. Include the question and the R code used to answer the question in the post.
Prob. 13.01 Prob. 13.01 Prob. 13.01 Prob. 13.01

Until the due date, you can only see your own team's work. Soon after the due date, the submission forums will become visible to everyone so you can see the work of other teams. Graders will add stars, 1 to 5, to show their opinion of the quality of each answer:

1 star: you gave it a try
2 stars: better try
3 stars: good overall but errors in the details
4 stars: very good -- essentially correct
5 stars: excellent -- goes beyond expectation