NATS 1500 2014W

From MathWiki


NATS 1500 Statistics and Reasoning in Modern Society 2014 W

News and Major Links

  • Bayes in a GIF (

Breaking News

The final exam for NATS 1500 will be held on Wednesday, April 23, 2014 at 9 am in the Rexall Tennis Center.

You will be allowed to use a non-programmable calculator with no memory other than the standard mathematical registers.

Old news




This is the link to the MOODLE site for NATS 1500 (

We will use Moodle for four purposes:

  • A forum in which you can post questions, comments and answers related to the course and to Statistics in general.
  • A forum entitled Statistics in the News in which you post interesting things you have found and comment on the posts of others.
  • A way for you to submit assignments.
  • A way for you to find out your grades.

Log in so you can start contributing posts and comments.


If you are prompted for a userid and a password to access a file, use 'nats' for both.

Week Dates
Files and links
1 Jan 6
video (

Introductory Slides (
Introductory Slides on Moodle (

Jan 8 (class)
video (
Introductory slides with notes added today (

Assignment 0 due: Jan 10 at noon

Jan 8 (tutorial) Bring your laptops and we'll work on installing R and Rcmdr

See Installing R and Rcmdr

2 Jan 13
video (
We start with: Topic 2: Exploring Global Health
in preparation for Assignment 1
and then resume Topic 1: Introduction to Statistical Ideas
with Introductory slides with notes last Wednesday (
Jan 15
video (
Preparation for this class: Read Chapters 1 and 3 of the textbook.
Jan 15
Work on Assignment 1
3 Jan 20
video (
Preparation for class discussion: Exercises on causality (

Some topics covered in this class:

  • Agresti diagram
  • Simpson's Paradox
  • External validity
  • Internal validity
Jan 22
video (
Preparation for this class:
  • Read Chapters 3 and 2

Assignment 1 due: Jan 24 at 11:55pm

Jan 22
Using R and Rcmdr:
4 Jan 27
video (
Class representatives
Jan 29
video (
Jan 29
5 Feb 3
video (
Feb 5
Feb 5
Assignment 2 due: Feb. 27 7 at 11:55pm
6 Feb 10
video (
Some sample test questions

Sample midterm test: test from last year ( with annotations added on Feb. 10

Feb 12 Mid-term test
LAS C: A to L, CLH D: M to Z
Feb 17/19 Reading Week
7 Feb 24
video (
Feb 26
Sorry! no video

Inadvertently, this class was not recorded.

Feb 26
8 Mar 3
video (
Information on Assignment 3
Mar 5
video (
Mar 5
  • Discussion of midterm test
  • Note: Last day to drop: March 7
9 March 10
video (
Mar 12
Mar 12
10 March 17
video (
Mar 19
video (
Chapter 6 slides with notes (
Mar 19
11 March 24
video (
  • Chapter 6 slides (
  • Chapter 7 slides (
Mar 26

video (
Mar 26
Assignment 3 due: March 28 at 11:55pm
12 Mar 31
  • Sample exam questions -- 2013 exam
  • Sample exam questions -- 2012 exam
    • I will cover selected questions during the tutorial on April 2.
Apr 2
video (
Apr 2
Project due: April 4 at 11:55pm
April 23
at 9 am
in the Rexall Tennis Center

Table of contents

General Information

How do you know what you know? Why do you feel very confident that some things are true but you feel less sure about others? Do you feel very sure about some things that, perhaps, you shouldn’t be so sure about? And unsure about things that you should, in fact, be confident of.

Statistical reasoning is crucial for a critical understanding of the flood of data and information we face daily in modern society. Understanding the principles of statistical reasoning and being aware of a number of widespread errors in statistical thinking is often the key for distinguishing arguments that are sound from those that are fallacious.

This course stresses the logic and reasoning behind statistics. We avoid complex mathematical formulas. Statistical reasoning is applied to a critical analysis of current events reported in the media and current scientific, medical and social controversies.

By the end of course, you will have developed an understanding of the reasons why scientific evidence can appear to lead to contradictory conclusions. You will have a better understanding of the assumptions that lead to these different conclusions and you will be in a better position of have informed judgements on the quality of scientific claims.

See also a course summary and an outline for 2011-12 ( on the Natural Science web site. (


  • Georges Monette, Ph.D., P.Stat. (

Assignments, Tests and Grading

Dates for NATS 1500: (unless otherwise indicated all work is due at 11:55pm on the date shown)
Due Weight Link
Assignment 0 (individual) Friday, Jan. 10 noon 1% Assignment 0
Assignment 1 (team) Friday, Jan. 24 5% Assignment 1
Assignment 2 (team) Friday, Feb. 7 5%(team)+
Assignment 2
Mid-term test Wednesday, Feb. 12 30% LAS C: A-L and CLH D: M-Z
Assignment 3 (team) Friday, March 28 5% Assignment 3
Project (individual) Friday, April 4 10% Information on project
Final exam April 10 - 26 35% Date set by registrar in late February
Participation Jan. 6 - April 4 4% Participation in Moodle forums and in class
Important academic dates from the York website (


Jessica M. Utts and Robert F. Heckard, (2006) Statistical Ideas and Methods, Thomson.

The original edition is out of print although many used copies may still be available. The edition available in the bookstore is a special reprint that includes all the material of the original text except many photographs and illustrations with a decorative function. All statistical material including graphs is the same. The pagination is different so assigned problems will be given by chapter and section numbers so you can follow the course with the original text or with the special reprint.

This is a very good textbook that is, unfortunately, like many other textbooks expensive. Consider options such as:

  • sharing a textbook with other students
  • using the copies on 2-hour reserve in the Steacie Science Library (QA 276 U88 2006 BOOK).
  • purchasing an electronic copy available through the York Bookstore (
  • trying to find a used copy through the York Bookstore ( or through other sources. The book has been used for this course for three previous years at York so that used copies should be available on campus.

Lectures and Tutorials

  • Class: Monday, 2:30 pm to 4:30 pm and Wednesday, 2:30 to 3:30 in Lassonde Building (LAS) C. The first class takes place on Monday, January 6, 2014.
  • Optional tutorials: Occasional optional tutorials will be held on Wednesdays, 3:30 pm to 4:30 pm in LAS C. The purpose of the tutorials will be to you with problems using computers and to discuss questions regarding the material of the course. The days on which tutorials will be held will be announced in advanced.

Course Policies

Late assignments
Late assignments or projects are penalized 10% of the value of the assignment for each day (or portion of a day) it is late. Unless a different time is specified, assignments and projects are due before 11:55 pm. Teams should plan to have a 'final draft' of the team assignments at least 2 days before the deadline so every member of the team can review and okay the draft before submission.
Missed term test
If you miss the term test with a suitably documented medical or compassionate reason, your mark for the term test will be imputed from your mark on the final exam. Otherwise you receive a grade of zero for the mid term.
Use of computers in class
You are encouraged to bring your laptop to class to use it for purposes directly related to the class such as taking notes, annotating slides posted on the web or trying out commands in R. Some students think that it does not affect anyone else if you are doing your own thing in class on your laptop or other electronic device. This is wrong. People seated around you cannot help but be distracted. Therefore, you may not use your laptop to view unrelated materials such as videos because this creates a visual distraction for students seated near you. Failure to observe this policy may result in warnings and may have an impact on your class participation mark.
Class demeanor
It is okay to turn momentarily to your neighbour in class to quietly ask a brief clarifying question related to the material in the class and it is okay to give a quiet and brief answer. You may not, however, have any conversation beyond a very brief and quiet exchange. The instructor may be so absorbed in what he is saying that he won't notice you. Other students, who may be struggling to remain absorbed, do notice and are very distracted by conversations. They will be annoyed and many will come and complain to me for my failure to enforce adequate discipline. Please don't put me in this awkward position.
Academic honesty
Familiarize yourself with the York University Senate Policy on Academic Honesty ( Violations of academic honesty are treated very seriously in university.


  • Datasets and lecture notes will be posted in Since some of the material may be copyrighted, access to the files is protected and for a userid and password. Use 'nats' for both.
  • When you find interesting links on the web you will be able to post them to forums on Moodle.

Team Assignments

Three assignments are done by semi-randomly assigned teams. Why random teams? One reason is that in almost all job interviews, you are asked about your experience working with teams. Working with a diverse team that you didn't select yourself gives you the opportunity to have experiences that will give you great anecdotes to use in your future job interviews. When you land the job, you will be much more likely to show the kind of leadership in team work that is invaluable in the modern workplace.

General comments and details
  • I will email the list of members in your team some time during the weekend of January 11. The members of your team can communicate by email, meet in person, and use the special Team Forum on the wiki which is visible only to members of your team and to the instructor and TAs but typically only on invitation from a member of the team.
  • All assignments are due at 11:55pm on Fridays. Use the tutorial hour the previous Wednesday to meet with your team and to finalize your submission for the assignment so you only need to do some proofreading and merging before the deadline on Friday.
  • Include the names of all active participants on the first page of the assignment. Everyone who participated actively gets the same grade. Those who didn't, get zero. Note that some team members might not respond because they have dropped -- or intend to drop -- the course. If your team shrinks to 3 or fewer, let me know and I can merge your team with another smaller team.
  • The more work you do on an assignment the better prepared you are to do well on the mid-term and on the final. But you shouldn't hog the work -- let others do their part too. Everyone should make sure that they understand the whole assignment. Discuss the assignment with your team members to make sure everyone understands the key points and difficulties of each question.

Class representatives

In approximately 3 weeks, we will select 3 class representatives. This is a practice in the Division of Natural Science. The class representatives meet, later in the term, with the Director of the Division of Natural Science, Paul Delaney, who uses their feedback to help guide the development of courses in the Division. The class representatives can also act to give feedback to the instructor.

Using computers for the course

Some assignments and the individual project will require you to use computing software to view and analyze data. The test and exam will require you to interpret output from the same software. You can learn the computing aspects of the course in a number of ways:

  • If you have access to a computer, you can download the software for the course. We use public domain software that runs on Windows, MacOS X or Linux. If you have a laptop, you are encouraged to bring it to class and to tutorials and office hours.
  • If you don't have access to a computer, you can get an account to use computers in the Gauss Lab where the software will be available. You also need a card to access the Gauss Lab.
  • The course will show detailed examples of simple statistical analyses using R.

Topic 1: Introduction to Statistical Ideas

Material covered

  • Introduction to the course
  • How do we know what we know: Links to slides and exercises will be posted soon
  • Prepare to discuss exercises by Jan 16.


Chapter 1

What is 'Statistics'?

The definition in the text says:
Definition: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.
This is certainly an important aspect of statistics but I think it only tells a small part of the story. Statistics is the science (and art) of working with uncertainty --- whether you plan to make decisions or not. We tend to think of statements as true or false. But in practice, the truth or falsity of most important statements is not known with certainty. There are all shades of degrees of uncertainty between between being certain a statement is true or false. Many of the most important decisions and choices we make in life are made despite the fact that we don't have all the information we would like to have to determine which route is best. Sometime we simply act as if something is true or false although we don't really know. Statistics is not just about how to make these difficult decisions. It is also about remembering and being aware of our uncertainty so we know where to look for better information and how to revise our hypotheses as relevant information becomes available. Statistics is not just about making decisions, it's about where to look for information that could lead us to change our decisions. It's about knowing when to keep an open mind and knowing when and how to change your mind.
Statistics is about the fascinating journey from ignorance to increasingly certain knowledge to wisdom. This is a journey we all follow individually. It is also a journey undertaken by disciplines, by political and social organisms and by mankind as a whole.
Possible test question

A traditional definition of statistics is that it is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty. Give a brief critique of this definition (100 to 300 words).

Experimental vs Observational data sets

If X and Y are correlated, what can it mean?
1) X causes Y?
2) Y causes X?
3) Another variable(s) Z(s) causes both X and Y?
a) Some Zs might be known and measurable. For these Zs we might be able to adjust using sophisticated statistical methods.
b) Some Zs might be known but hard or impossible to measure. This is more difficult to deal with.
c) Some Zs might not be discovered until the year 3000. We can't adjust statistically for these.
4) Selection: maybe there's no relationship but some data got thrown out or ignored and the data left created the impression of a relationship.
5) Chance: This is the one statisticians are really good at dealing with -- as you will learn in this course.
What if we have an 'experiment' with 'random allocation of X' to experimental units?
1) X causes Y? possible
2) Y causes X? No! We know what caused X. It was the coin toss or the random number generator that caused X.
3) Another variable(s) Z(s) causes both X and Y? Maybe. But it could only be by chance that differences in levels of any combination of Zs, known or not, measurable or not, would have a large impact on Y.
4) Selection? We can exclude this by checking how the data were obtained.
5) Chance again.
So, if we can exclude selection, we are left with two options:
1) X causes Y, or
2) Chance.
We can use statistical analysis to measure chance. If the chance is very small then we may be left with X causes Y as the plausible explanation.
How should you react to causal claims based on data analyses?
1) The key question you should ask: Is the data set experimental or observational? You might have to ask questions to answer this. Generally, it isn't obvious from the appearance of the data. The critical issue is how were the levels of X assigned to the units: strictly randomly or by choice or judgement of the subjects or of the experimenters?
2) If experimental: double check to make sure allocation was really random and not by judgment or done haphazardly? Was the study double-blind? Are there possible biases in measurements? Psychological factors that influence outcome? Does the claim match the nature of the experiment or is the claim stretching to something that does not correspond exactly to what was done in the experiment?
3) If observational:
a) Can you poke an obvious hole in the claim? E.g. is there a plausible alternative explanation that was not taken into account in the analysis? In this case, you've countered the claim.
b) What has the analysis adjusted for? Are these factors that can be measured with precision? What kinds of factors are not accounted for?
c) Has the analysis over-corrected by controlling for possible mediating factors that should not be controlled?
d) If a causal connection seems paradoxical, can you think of plausible mediating factors that might explain causality?
With observational data, you can't be 100% sure that the relationship is causal but you can check whether important alternative possibilities have been adequately addressed.
Some examples in the news: Toronto Star: Pulse (
Which examples are experimental and which are observational?
Which conclusions are reasonable and which are not? Why?

Things to do

'Things to do' are tasks that are not graded but are important to keep up with the course


Exercises are not graded but they are useful preparation for the mid-term test or the final exam

Questions on causality ( for discussion January 20

Text Chapter 1

1.1, 1.5, 1.6, 1.7, 1.8, 1.10, 1.11, 1.13, 1.14, 1.15, 1.16, 1.19, 1.23.

Assignment 0

Due: 12 noon, January 10, 2014

I would like to know something about you in order to form balanced random teams of 4 or 5 students to work on Assigments 1, 2 and 3. I will use your emailed responses to this Assignment 0 to form the teams. You will receive the names of your team members on January 13 so you can meet face to face at the break during the class.
To complete Assignment 0, fill out this survey ( It should take less than 5 minutes.

Topic 2: Exploring Global Health

Hans Rosling and global health

Which of the following pairs of countries has the higher child mortality:

Sri Lanka Turkey
Poland South Korea
Malaysia Russia
Pakistan Viet Nam
Thailand South Africa

Team Assignment 1

Due: January 24, 2014
  • Explore Gapminder World (
    • Learn how to select different variables for the Y axis, the X axis, the size and the colour of points.
    • Learn how to select different subsets of countries for highlighting and how to turn 'trails' on or off.
    • Learn how to control the time animation and its speed.
  • Find a selection of variables that seem to tell an interesting story about a trend or a historical event that you find interesting.
  • Do a bit of research on this trend or event.
  • Copy the URL for the animation you selected by clicking on the 'Share graph' button and copying the URL that is shown.
  • Write an interesting short essay (300 to 2,000 words) describing what the animation shows.
  • Post the URL and the essay on the Moodle forum for Assignment 1.
  • Note that each team prepares only one essay and URL. Of course, if you want to post other essays as posts to the blog, that is more than welcome!
  • In response to many questions: You should include links to online materials you used, e.g. Wikipedia articles are considered acceptable for this assignment, and you should include references to other materials that you used but there is no need for an exhaustive list of references as would be required for a more formal scholarly essay.

Topic 3: Statistics in the News

Team Assignment 2

Due: February 7, 2014
1) Find a topic in the news currently or within the past year that involves some controversy over the interpretation of evidence.
2) Collect some clippings or on-line links to news, magazine or journal articles related to the topic. Most scientific topics in the news are ultimately based on one or more articles in academic journals. Find the relevant article(s).
3) Discuss why the topic is controversial. Is the controversy over causality? Why is there room for disagreement? What kind of evidence, data or theory, is available to support the various sides of the issue? Discuss the apparent strengths and weaknesses in the data or theory on either side? Is the available data observational or experimental? Is this relevant to the issue? What kind of data, if any, could resolve the issue? What obstacles are there to obtaining the ideal data to resolve the issue? Is better data likely to become available and how would it be helpful?
4) End the assignment with brief individual essays (identify the authors) stating your individual positions on the topic? Have you adopted a point of view? Describe the ways in which you remain uncertain and how your uncertainty could be resolved. If you wish you can write this part of the assignment as if it were a panel discussion among the members or your team. You could, in fact, record a panel discussion and transcribe it to text.

You are not expected to become experts in two weeks in the topic you choose. The goal of the assignment is for you to become informed lay persons with an understanding of the nature of the controversy and uncertainty concerning your topic, an understanding of the approaches that could resolve it and the challenges to achieving a resolution.

Plan to devote 5 to 10 pages of commentary, double-spaced, for the common part of the assignment and one or more pages double-spaced on each individual essay.

Half of the grade (5 marks) is based on the common part of the assignment for which all members of the team receive the same grade. The remaining half of the grade (5 marks) is for individual essays. The grade is based on the quality of your research and the interest and intellectual energy you display in dealing with the problem.

You should submit the assignment as a single PDF file uploaded by a member of your team through Moodle. The single file should include the common part of the assignment followed by the individual essays. Don't forget to identify the name of the author at the beginning of each essay. If you cannot submit the assignment in this format, please see me to discuss alternatives.

Possible test/exam question

  1. Review these exercises for discussion in class (
  2. Why does Hans Rosling say that students at the Karolinska Institute know statistically significantly less about the world than do chimpanzees -- and professors at the Karolinska Institute are roughly on a par with chimpanzees?
  3. A study showed (this is true) that students who view the recorded videos of the lectures many times perform less well on the final exam than students who view the videos fewer times. Upon discovering this, your professor announces that he/she will discontinue recording the lectures because, the professor says, the videos have been shown to cause students to do perform more poorly on the course. Explain your point of view to the professor -- in clear and simple language even a professor might be able to understand.
    • Explain why the number of lectures attended could be a potential confounding factor in considering the relationship between the frequency of viewing class videos and performance on the course.
    • Can you think of potential mediating factors?
  4. In the 1964 U.S. Public Health Service study it was found that, for men and for women in each age group, current smokers were on average much healthier than the former people who had quit smoking.
    • Suppose that an important factor that explains this surprising fact is that people who quit smoking tend to experience increased stress and weight gain as a result of quitting, and that these factors adversely affect health. Are increased stress and weight gain potential mediating factors or confounding factors? Explain briefly.
    • Suppose that an important factor that explains this fact is that an important proportion of people who quit smoking do so because they are in poor health and have been strongly advised to quit smoking by their physicians. Would this be a mediating factor or a confounding factor in considering the relationship between between quitting smoking and health? Explain briefly.
    • In this study, are age and sex potential confounding factors or mediating factors?
  5. A study shows that heavy users of sunscreen lotion have a higher chance of developing skin cancer.
    • Does this imply that you should avoid using sunscreen lotion in order to reduce your chances of developing skin cancer?
    • Can you think of potential confounding factors and potential mediating factors?

Team Assignment 3

Assignment 3 consists in doing the accumulated problems from week to week that are assigned over the last six weeks of the course. It is due on March 28.

If we do this well, we will end up writing our own solution manual for the textbook.


  1. Only the numbers in red need to be done for Assignment 3.
  2. The numbers shown in the text all have the form '5.x' where 'x' is the number of the question within chapter 5. In the following lists I only show 'x'.

Chapter 4 (omit section 4.3)

6, 9, 17, 22, 34, 48, 80, 82.

Chapter 5, pp. 161--168

Looking for Patterns with Scatterplots:
1, 2, 3, 7
Describing Linear Pattern with a Regression Line:
11, 14
Measuring Strength and Direction with Correlation:
24 (important -- likely to be on exam), 27 (also a good candidate for the exam)
Why the Answers May Not Make Sense & Correlation Does Not Prove Causation:
36 (refers to 7), 39, 40,
Chapter Exercises:
46, 48, 49, 59, 60, 61, 62.

Chapter 6

Displaying Relationships Between Categorical Variables:
3, 4, 6, 7,
Risk, Relative Risk, Odds Ratio and Increased Risk & Misleading Statistics About Risk:
10 (nice exam question), 11-14 (ditto), 20 (refers to 6), 22
The Effect of a Third Variable and Simpson's Paradox:
27, 29, 31
Assessing the Statistical Significance of a 2 x 2 Table:
33, 34, 43
Chapter Exercises:
56,57, 58, 62.

Chapter 7

1. Random Circumstances & 2. Interpretations of Probability
2, 5, 7, 16
3. Probability Definitions and Relationships
18, 19, 20
4. Basic Rules for Finding Probabilities
34, 35, 36, 42
5. Strategies for Finding Complicated Probabilities
44, 45, 46, 47, 50, 54
6. Using Simulation to Estimate Probabilities
7. Coincidences and Intuitive Judgments about Probability
64, 68, 72, 76 (similar question likely to be on test)
Chapter Exercises
82, 83, 84, 85, 91 to 98 (sequence of exercises on same problem).

That's it!!

Project (Individual)

The general idea is to perform an analysis of some data that you find interesting using the statistical tools and critical insights that you have developed in the course. To help you find a topic and data you can have a look at Statistics: Pedagogical resources on this wiki. The whole project must be your own work although you may consult with other members of your team or other students.

  1. Identify a topic you find interesting about which you have a question that could be resolved with appropriate data and analysis.
  2. Find a number of sources (3 or more-- except in very special cases where 3 or more sources would not exist) that provide information relevant to your question. At least one source should have relevant data.
  3. Perform some analyses of the data including summaries of the distribution of relevant variable and relevant graphs.
  4. Based on a critical assessment of your sources and your analysis, discuss the implications for your question.
  5. Discuss clearly the strengths and limitations of your analysis and existing information in addressing your question.

Some guidelines for your report:

  1. Aim for a length of 8 to 12 pages double spaced of analyses and discussion plus at least 2 pages of relevant graphs.
  2. Show the results of at least one and preferably two analyses using a single data set -- unless you are very ambitious and want to use more.


  1. Clear expression of specific question and relevant field: 10%
  2. Choice of sources and clear references: 10%
  3. Clarity and quality of argument: 20%
  4. Relevance and quality of analysis: 20%
  5. Relevance and quality of graphs: 20%
  6. Clear formal academic style of writing: 5%
  7. Effort: 5%
  8. Structure: 5%
  9. Overall appearance: 5%

You can get bonus marks in any category if you do an absolutely brilliant job in that category.

You should submit your project as a single PDF file through Moodle

Interesting links

Some places to look for Statistics in the News

  • (, a data journalism site founded by Nate Silver who is famous for his accurate predictions of U.S. elections.
  • Toronto Star (
    Many articles in daily newspapers use data or make claims that benefit from a critical analysis.
  • The Guardian (
    The Guardian is one of two international newspapers that have a special reputation for the effective use and presentation of data. The other is:
  • The New York Times ( (limit of 10 articles/month without a subscription)
  • New York Time Opinion Pages ( (doesn't require a subscription)
  • New York Times online access through York (
  • This is Statistics (, a site created by the American Statistical Association.
  • Significance ( published by the Royal Statistical Society.
  • Stats Chat (, a blog by Thomas Lumley
  • Statistical Modeling, Causal Inference and Social Science ( a blog by Andrew Gelman. This one is quite advanced, even graduate students find it challenging.

A few interesting articles

Causality and Climate Change

Relevant Books