NATS 1500 2016W

From MathWiki

Image:Cropped-quadratic.png

NATS 1500: Statistics and Reasoning in Modern Society 2016 W

Statistical literacy is a necessary precondition for an educated citizenship in a technological democracy -- Gerd Gigerenzer et al. in Helping Doctors and Patients Make Sense of Health Statistics (http://citrixweb.mpib-berlin.mpg.de/montez/upload/PaperLibrary/GG_etAl_Helping_doctors-1.pdf)
A certain elementary training in statistical method is becoming as necessary for anyone living in this world of today as reading and writing. – H. G. Wells in "The Informative Content of Education," The Presidential Address to the Educational Science Section of the British Association for the Advancement of Science, given on September 2nd, 1937.

Quick Links

Calendar NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792)

Data
Links
How to use Rcmdr (http://wiki.math.yorku.ca/index.php/R:_Rcmdr_--_how_to)

Course Description
Assignments and Tests
Registrar's page (https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/7/wo/Ayb9ybtr7DZfrSvAHbHVow/3.1.10.8.3.0.0.5)

Course forums:
Visit at least once a week
Make at least 6 good contributions
in each forum during the term
NATS 1500 Q&A Forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=926750) NATS 1500 Statistics in the News Forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=926751)
Other links: NATS 1500 Team Forums (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=962579)
(Forum for each team -- activated when teams are assigned)
Current Assignment:

Team Assignment 3 -- due March 25

Breaking News

Office hours

I'll have office hours on

  • Friday, March 18, from 3 pm to 4 pm
  • Monday, April 4, from 1 pm to 2 pm
  • Friday, April 8, from 8:30 to 10:30 but not at 11:30 to 12:30
  • Friday, April 8, from 3 pm to 4 pm

Final Exam

The final exam will be held on Monday, April 11, 2016 from 9am to 11am in the Tennis Canada Aviva Centre (formerly known as the Rexall Centre).
Old news

Table of contents


Calendar

January

Week Dates
(2016)
Files and links
1 Jan 4
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_04.swf)

Assignment 0 due: Jan 9 at 11:55 pm
Topic 1: Causal vs Predictive Inference / Experimental vs Observational Data

  • Link to the recording of the screen in class: video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_04.swf)
    • I apologize for the poor sound. The amplifier on my mic was set too high resulting in distortion and clipping. Turn down your audio before listening the video. I hope it will be better next class.
    • Since we used the blackboard for the first hour, only the sound is relevant until the 1:04:15 mark on the timeline. You can see the timeline by moving your mouse over the video. Here are links to three pictures of the blackboard:
      • Board #1 (http://blackwell.math.yorku.ca/NATS1500/2016/blackboard01_2016_01_04.jpg)
      • Board #2 (http://blackwell.math.yorku.ca/NATS1500/2016/blackboard02_2016_01_04.jpg)
      • Board #3 (http://blackwell.math.yorku.ca/NATS1500/2016/blackboard03_2016_01_04.jpg)
        If the photographs are too large, some browsers will let you shrink them with "Ctrl--", i.e. hold down the 'Ctrl" key and then press the '-' (minus key) as many time as you need for the photo to be visible on your screen.
  • Lecture notes: Lies, Damned Lies and Statistics (http://blackwell.math.yorku.ca/NATS1500/2016/Lies_Damned_Lies_notes_2016_01_04.pdf)
Jan 6 (class)
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_06.swf)

Lies, Damned Lies and Statistics with latest annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Lies%20-%20Damned%20Lies_notes_2016_01_06.pdf)

Jan 6 (tutorial)

Bring your laptops and we'll work on installing R and Rcmdr
See Installing R and Rcmdr

2 Jan 11
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_11.swf)

Lies, Damned Lies and Statistics with latest annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Lies%20-%20Damned%20Lies_notes_2016_01_11.pdf)

Jan 13
class
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_13.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_13.mp4)
Jan 13
tutorial

Work on Assignment 1

3 Jan 18
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_18.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_18.mp4)
  • To do: Select class representatives
  • Slides
    • Chapter 1 (http://blackwell.math.yorku.ca/NATS1500/2016/Slides/Chapter_01-03-edited.pdf) with added notes (http://blackwell.math.yorku.ca/NATS1500/2016/Slides/Chapter_01_withNotes_2016_01_18.pdf)
    • Chapter 3 (http://blackwell.math.yorku.ca/NATS1500/2016/Slides/Chapter_03.pdf) with added notes (http://blackwell.math.yorku.ca/NATS1500/2016/Slides/Chapter_03_withNotes_2016_01_18.pdf). See p. 27 for data on capital punishement in Florida.
Jan 20
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_20.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_20.mp4)
  • To do: Select class representatives -- since I forgot to do it last time
  • Preparation for this class:
    • Read Chapter 2
  • Slides
    • Chapter 2 (http://blackwell.math.yorku.ca/NATS1500/2016/Slides/Chapter_02.pdf)
Jan 20
tutorial
4 Jan 25
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_25.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_25.mp4)
Jan 27
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_27.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_01_27.mp4)
Jan 27
tutorial

Using R and Rcmdr:

February

Week Dates
(2016)
Files and links
5 Feb 1
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_01.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_01.mp4)
Feb 3
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_03.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_03.mp4)
Feb 3
tutorial
6 Feb 8
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_08.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_08.mp4)
Feb 10
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_10.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_10.mp4)
Feb 15/17 Reading Week
7 Feb 22
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_22.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_22.mp4)

Chapter 5 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_05-w_2015_02_22.pdf)

Feb 24
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_24.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_24.mp4)

Chapter 5 with annotations today (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_05-w_2015_02_24.pdf)
Visualizing Regression (http://blackwell.math.yorku.ca/NATS1500/2016/Visualizing_Regression-I-Simple_2016_02_24.pdf)

Feb 24
tutorial

Midterm with annotations added today (http://blackwell.math.yorku.ca/NATS1500/2016/MidTerm2014_2016_02_24.pdf)
Another previous midterm with some annotations (http://blackwell.math.yorku.ca/NATS1500/2016/NATS_1500_2013_midterm_2016_02_24.pdf)

Midterm
test
Saturday
Feb. 27
8 pm to 9 pm in
  • Vari Hall C Curtis Lecture Hall A if your family name begins with A to M, and in
  • Vari Hall D Curtis Lecture Hall B if your family name begins with N to Z
8 Feb 29
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_29.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_02_29.mp4)

Chapter 6 (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_06.pdf)

Mar 2
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_02.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_02.mp4)

Chapter 6 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_06_2016_02_29.pdf)

Mar 2
tutorial
  • Note: Last day to drop: March 4

March and April

Week Dates
(2016)
Files and links
9 March 7
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_07.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_07.mp4)

Chapter 6 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_06_2016_03_07.pdf)
Using Rcmdr

Mar 9
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_09.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_09.mp4)

Chapter 7 (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_07.pdf) with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_07_2016_03_09.pdf)

Mar 9
tutorial
10

March 14
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_14.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_14.mp4)

Chapter 7 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_07_2016_03_14.pdf)

Mar 16
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_16.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_16.mp4)
Mar 16
tutorial
11 March 21

Class cancelled due to security problem in Vari Hall

Mar 23
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_23.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_23.mp4)

Chapter 7 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_07_2016_03_23.pdf)

Mar 23
tutorial

Sample final exam (http://blackwell.math.yorku.ca/NATS1500/2016/NATS%201500%20Final%20Exam%202013%2005%2006.pdf)
Sample questions on z-scores and regression to the mean (http://blackwell.math.yorku.ca/NATS1500/2015/Sample%20correlation%20questions_2015_03_25.pdf)

12 Mar 28
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_28.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_28.mp4)

Chapter 7 with annotations (http://blackwell.math.yorku.ca/NATS1500/2016/Chapter_07_2016_03_28.pdf)

Mar 30
video (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_30.swf)
mp4 (http://blackwell.math.yorku.ca/videos/nats1500_2016_03_30.mp4)

Another sample final exam (http://blackwell.math.yorku.ca/NATS1500/2016/Sample%20Final%20Exam.pdf)

Mar 30
tutorial
Apr 4 No class - This day follows the Friday schedule of classes
Final
exam

April 6 to April 20
The specific date should be available around the end of February

General Information

How do you know what you know? Why do you feel very confident that some things are true but you feel less sure about others? Do you feel very sure about some things that, perhaps, you shouldn’t be so sure about? And unsure about things that you should, in fact, be confident of.

Statistical reasoning is crucial for a critical understanding of the flood of data and information we face daily in modern society. Understanding the principles of statistical reasoning and being aware of a number of widespread errors in statistical thinking is often the key for distinguishing arguments that are sound from those that are fallacious.

This course stresses the logic and reasoning behind statistics. We avoid complex mathematical formulas. Statistical reasoning is applied to a critical analysis of current events reported in the media and current scientific, medical and social controversies.

By the end of course, you will have developed an understanding of the reasons why scientific evidence can appear to lead to contradictory conclusions. You will have a better understanding of the assumptions that lead to these different conclusions and you will be in a better position of have informed judgments on the quality of scientific claims.

Instructor

Assignments, Tests and Grading

Dates for NATS 1500: (unless otherwise indicated all work is due at 11:55pm on the date shown)
Due Weight Link
Assignment 0 (individual) Saturday, Jan. 9 noon 1% Assignment 0
Assignment 1 (team) Wednesday, Jan. 27 5% Assignment 1
Assignment 2 (team)

outline or plan: Thursday, Feb. 11
Final copy: Wednesday, Feb. 24

5%(team)+
5%(individual
essay)
Assignment 2
Mid-term test Saturday, Feb. 27 30% 8 pm to 9 pm in Vari Hall C and in Vari Hall D Curtis Lecture Halls A and B (CLH A and CLH B).
Assignment 3 (team) Friday, March 25 5%
Project (individual) Monday, April 4 10%
Final exam April 6 - 20 35% Date will be set by registrar
in late February
Participation Jan. 4 - April 4 4%

Participation in the NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792) forums and in class. You should contribute at least 6 meaningful posts to each of the NATS 1500 Q&A Forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=926750) (they can be good questions or good answers) and to the NATS 1500 Statistics in the News Forum (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=926751).
After the last class and no later than April 7, submit a brief summary statement through the NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792) (500 words or less) about your contributions to the class. E.g. how many posting and comments? Describe briefly some of the most significant ones.

See also: Important academic dates from the York website (http://www.registrar.yorku.ca/enrol/dates/fw15)

Textbook

Jessica M. Utts and Robert F. Heckard, (2006) Statistical Ideas and Methods, Thomson.

The original edition is out of print although many used copies may still be available. The edition available in the bookstore is a special reprint that includes all the material of the original text except many photographs and illustrations with a decorative function. All statistical material including graphs is the same. The pagination is different so assigned problems will be given by chapter and section numbers so you can follow the course with the original text or with the special reprint.

This is a very good textbook that is, unfortunately, like many other textbooks expensive. Consider options such as:

  • sharing a textbook with other students
  • using the copies on 2-hour reserve in the Steacie Science Library (QA 276 U88 2006 BOOK).
  • trying to find a used copy through the York Bookstore (http://www.sellmytextbooks.org/members/19/) or through other sources. The book has been used for this course for five previous years at York so that used copies should be available on campus.

Official Registrar Course Description

Statistical reasoning is crucial for a critical understanding of the flood of information we face daily in modern society. This course examines the principles of statistical reasoning with an emphasis on applications to everyday decisions and turning information into understanding. Course credit exclusion: SC/MATH 1532 3.00. NCR Note: Not open to students who have passed or are taking AK/AS/SC MATH 2560 3.00, or who have received advanced standing for the equivalent.

Lectures and Tutorials

  • Class: Monday, 2:30 pm to 4:30 pm and Wednesday, 2:30 to 3:30 in Vari Hall (VH) C. The first class takes place on Monday, January 4, 2016.
  • Optional tutorials: Occasional optional tutorials will be held on Wednesdays, 3:30 pm to 4:30 pm in VH C. The purpose of the tutorials will be to help you with problems using computers and to discuss questions regarding the material of the course.

Course Policies

Late assignments
Late assignments or projects are penalized 10% of the value of the assignment for each day (or portion of a day) it is late. Unless a different time is specified, assignments and projects are due before 11:55 pm on the due date. Teams should plan to have a 'final draft' of the team assignments at least 2 days before the deadline so every member of the team can review and okay the draft before submission.
Missed term test
If you miss the term test with a suitably documented medical or compassionate reason, your mark for the term test will be imputed from your mark on the final exam. Otherwise you receive a grade of zero for the mid term.
Use of computers in class
You are encouraged to bring your laptop to class to use it for purposes directly related to the class such as taking notes, annotating slides posted on the web or trying out commands in R. Some students think that it does not affect anyone else if you are doing your own thing in class on your laptop or other electronic device. This is wrong. People seated around you cannot help but be distracted. And the instructor tends to get annoyed when members of the class are clearly lost in a different dimension. Therefore, you may not use your laptop to view unrelated materials such as videos because this creates a visual distraction for students seated near you and your lack of presence in the class is distracting to the instructor.
Class demeanor
If you need to ask your neighbour a question, pass them a note very discreetly. Sometimes, the instructor is so absorbed in what he is saying that he doesn't notice people talking. However, other students, who may be struggling to remain absorbed, do notice and are very distracted by conversations. They will be annoyed and many will come and complain to me for my failure to enforce adequate discipline. Please don't put me in this awkward position.
Academic honesty
Familiarize yourself with the York University Senate Policy on Academic Honesty (http://www.yorku.ca/univsec/policies/document.php?document=69). Violations of academic honesty are treated very seriously in university.

Resources

  • Datasets and lecture notes will be posted in http://blackwell.math.yorku.ca/NATS1500/2016. Since some of the material may be copyrighted, access to the files is protected and you may be prompted for a userid and password. Use 'nats' for both.
  • When you find interesting links on the web you will be able to post them to forums on NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792) and contribute to your grade for participation.

Team Assignments

Three assignments are done by semi-randomly assigned teams. Why random teams? One reason is that in almost all job interviews, you are asked about your experience working with teams. Working with a diverse team that you didn't select yourself gives you the opportunity to have experiences that will give you great anecdotes to use in your future job interviews. When you land the job, you will be much more likely to show the kind of leadership in team work that is invaluable in the modern workplace.

General comments and details
  • I will email the list of members in your team some time during the weekend of January 10. The members of your team can communicate by email, meet in person, and use the special Team Forum on the wiki which is visible only to members of your team and to the instructor and TAs but typically only on invitation from a member of the team.
  • All assignments are due at 11:55pm on the due date. Use the tutorial hour the previous Wednesday to meet with your team and to finalize your submission for the assignment so you only need to do some proofreading and merging before the final deadline.
  • Include the names of all active participants on the first page of the assignment. Everyone who participated actively gets the same grade. Those who didn't, get zero. Note that some team members might not respond because they have dropped -- or intend to drop -- the course. If your team shrinks to 3 or fewer, let me know and I can merge your team with another small team. Exceptionally, if some members of the team consistently do considerably less work than other members, the instructor may award those members a correspondingly reduced grade.
  • The more work you do on an assignment the better prepared you are to do well on the mid-term and on the final. But you shouldn't hog the work -- let others do their part too. Everyone should make sure that they understand the whole assignment. Discuss the assignment with your team members to make sure everyone understands the key points and difficulties of each question.

Class representatives

In approximately 2 weeks, we will select 3 class representatives. This is a practice in the Division of Natural Science. The class representatives meet, later in the term, with the Director of the Division of Natural Science, Paul Delaney, who uses their feedback to help guide the development of courses in the Division. The class representatives can also act to give feedback to the instructor as well as acting as a liaison.

Using computers for the course

Some assignments and the individual project will require you to use computing software to view and analyze data. The test and exam will require you to interpret output from the same software. You can learn the computing aspects of the course in a number of ways:

  • If you have access to a computer, you can download the software for the course. We use public domain software that runs on Windows, MacOS X or Linux. If you have a laptop, you are encouraged to bring it to class and to tutorials and office hours.
  • If you don't have access to a computer, you can get an account to use computers in the Gauss Lab where the software will be available. You also need a card to access the Gauss Lab.
  • The course will show detailed examples of simple statistical analyses using R.

Moodle

This is the NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792) link to the MOODLE site for NATS 1500.

We will use Moodle for four purposes:

  • A forum in which you can post questions, comments and answers related to the course and to Statistics in general.
  • A forum entitled Statistics in the News in which you post interesting things you have found and comment on the posts of others.
  • A team forum to make it easy for you to communicate with your team members.
  • A way for you to submit assignments.
  • A way for you to find out your grades.

Log in so you can start contributing posts and comments.

Where can I go for help?

  • Tutorials, Wednesdays 3:30 to 4:30. Teaching assistants and/or the instructor will be present to answer questions.
  • Instructor's office hours: Friday, 8:30 to 10:30 in North Ross 626.
  • NATS AID: Undergraduate students who have already taken the course and done very well volunteer to help. More information to come.
  • Statistics Learning Centre in the Department of Mathematics and Statistics at South Ross 525. Teaching Assistants who are assigned to help with introductory statistics courses are available to help. Some may be designated for NATS 1500 because, believe or not, we cover some material that many graduate students have not learned! More information to come.

Topic 1: Introduction to Statistical Ideas

Material covered

  • Introduction to the course
  • How do we know what we know?
  • Prepare to discuss exercises by Jan 19.

Textbook

Chapter 1

What is 'Statistics'?

The definition in the text says:
Definition: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.
This is certainly an important aspect of statistics but I think it only tells a small part of the story. Statistics is the science (and art) of working with uncertainty --- whether you plan to make decisions or not. We tend to think of statements as true or false. But in practice, the truth or falsity of most important statements is not known with certainty. There are all shades of degrees of uncertainty between between being certain a statement is true or false. Many of the most important decisions and choices we make in life are made despite the fact that we don't have all the information we would like to have to determine which route is best. Sometime we simply act as if something is true or false although we don't really know. Statistics is not just about how to make these difficult decisions. It is also about remembering and being aware of our uncertainty so we know where to look for better information and how to revise our hypotheses as relevant information becomes available. Statistics is not just about making decisions, it's about where to look for information that could lead us to change our decisions. It's about knowing when to keep an open mind and knowing when and how to change your mind.
Statistics is about the fascinating journey from ignorance to increasingly certain knowledge to wisdom. This is a journey we all follow individually. It is also a journey undertaken by disciplines, by political and social organisms and by mankind as a whole.
Possible test question

A traditional definition of statistics is that it is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty. Give a brief critique of this definition (100 to 300 words).

Experimental vs Observational data sets

If X and Y are correlated, what can it mean?
1) X causes Y?
2) Y causes X?
3) Another variable(s) Z(s) causes both X and Y?
a) Some Zs might be known and measurable. For these Zs we might be able to adjust using sophisticated statistical methods.
b) Some Zs might be known but hard or impossible to measure. This is more difficult to deal with.
c) Some Zs might not be discovered until the year 3000. We can't adjust statistically for these.
4) Selection: maybe there's no relationship but some data got thrown out or ignored and the data left created the impression of a relationship.
5) Chance: This is the one statisticians are really good at dealing with -- as you will learn in this course.
What if we have an 'experiment' with 'random assignment of levels of X' to experimental units?
1) X causes Y? possible
2) Y causes X? No! We know what caused X. It was the coin toss or the random number generator that caused X.
3) Another variable(s) Z(s) causes both X and Y? Maybe. But it could only be by chance that differences in levels of any combination of Zs, known or not, measurable or not, would have a large impact on Y.
4) Selection? We can exclude this by checking how the data were obtained.
5) Chance again.
So, if we can exclude selection, we are left with two options:
1) X causes Y, or
2) Chance.
We can use statistical analysis to measure chance. If the chance is very small then we may be left with X causes Y as the plausible explanation.
How should you react to causal claims based on data analyses?
1) The key question you should ask: Is the data set experimental or observational? You might have to ask questions to answer this. Generally, it isn't obvious from the appearance of the data. The critical issue is how were the levels of X assigned to the units: strictly randomly or by choice or judgement of the subjects or of the experimenters?
2) If experimental: double check to make sure assignment was really random and not by judgment or done haphazardly? Was the study double-blind? Are there possible biases in measurements? Psychological factors that influence outcome? Does the claim match the nature of the experiment or is the claim stretching to something that does not correspond exactly to what was done in the experiment?
3) If observational:
a) Can you poke an obvious hole in the claim? E.g. is there a plausible alternative explanation that was not taken into account in the analysis? In this case, you've countered the claim.
b) What has the analysis adjusted for? Are these factors that can be measured with precision? What kinds of factors are not accounted for?
c) Has the analysis over-corrected by controlling for possible mediating factors that should not be controlled?
d) If a causal connection seems paradoxical, can you think of plausible mediating factors that might explain causality?
With observational data, you can't be 100% sure that the relationship is causal but you can check whether important alternative possibilities have been adequately addressed.
Some examples in the news: Toronto Star: Pulse (http://www.math.yorku.ca/people/georges/Files/NATS1500/Week02/StatisticsInTheNews030926.html)
Which examples are experimental and which are observational?
Which conclusions are reasonable and which are not? Why?

Things to do

'Things to do' are tasks that are not graded but are important to keep up with the course

Exercises

Exercises are not graded but they are useful preparation for the mid-term test or the final exam

Questions on causality (http://blackwell.math.yorku.ca/NATS1500/2015/NATS%201500%20Intro%20Exercises.pdf) for discussion January 18

Text Chapter 1

1.1, 1.5, 1.6, 1.7, 1.8, 1.10, 1.11, 1.13, 1.14, 1.15, 1.16, 1.19, 1.23.

Assignment 0

Due: 11:55pm, January 9, 2016

I would like to know something about you in order to form balanced random teams of 4 or 5 students to work on Assignments 1, 2 and 3. I will use your emailed responses to this Assignment 0 to form the teams. You will receive the names of your team members on January 11 so you can meet face to face at the break during the class.
To complete Assignment 0, fill out the NATS 1500 survey (https://docs.google.com/forms/d/1RzMv5n-gFdLM33UB0YkvnFAQZl0Sjl9E3JAj9HPyfws/viewform?usp=send_form). It should take less than 5 minutes. Note that the assignment is submitted through Google Docs, not through Moodle, so it won't show up on Moodle.

Topic 2: Exploring Global Health

Hans Rosling and global health

Which of the following pairs of countries has the higher child mortality:

Sri Lanka Turkey
Poland South Korea
Malaysia Russia
Pakistan Viet Nam
Thailand South Africa


Team Assignment 1

Due: January 27, 2016
  • Explore Gapminder World (http://www.gapminder.org/world/)
    • Learn how to select different variables for the Y axis, the X axis, the size and the colour of points.
    • Learn how to select different subsets of countries for highlighting and how to turn 'trails' on or off.
    • Learn how to control the time animation and its speed.
  • Find a selection of variables that seem to tell an interesting story about a trend or a historical event that you find interesting.
  • Do a bit of research on this trend or event.
  • Copy the URL for the animation you selected by clicking on the 'Share graph' button and copying the URL that is shown.
  • Write an interesting short essay, one per team,(300 to 2,000 words) describing what the animation shows.
  • You should include links to online materials you used, e.g. Wikipedia articles are considered acceptable for this assignment, and you should include references to other materials that you used but there is no need for an exhaustive list of references as would be required for a more formal scholarly essay.
  • You can discuss and share drafts of your posting in your NATS 1500 Team Forums (https://moodle.yorku.ca/moodle/mod/forum/view.php?id=962579).
  • Once you agree on a final draft, post your essay and the URL in the NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792) forum for Assignment 1. Be sure to include your team's name in the title of the posting, e.g. "Pillai: Fascinating history of ... in South-East Asia."
  • Note that each team prepares only one team essay and URL. Of course, if you want to post other essays as posts to the blog, that is more than welcome!
  • After the due date, have a look at the work done by other teams and participate in the discussion.

Topic 3: Statistics in the News

Team Assignment 2

Due: February 11 and 24, 2016
Through the NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792)
Submit a draft copy an outline or plan entitled GroupName_draft.pdf by 11:55pm on Thursday, Feb. 11, 2016.
Submit a final copy entitled GroupName_final.pdf by 11:55pm on Wednesday, Feb. 24, 2016.
where you substitute the name of your group for GroupName.
1) Find a topic in the news currently or within the past year that involves some controversy over the interpretation of evidence in which the issue is explicitly or implicitly related to causality.
2) Collect some clippings or on-line links to news, magazine or journal articles related to the topic. Most scientific topics in the news are ultimately based on one or more articles in academic journals. Find the relevant article(s).
3) Discuss why the topic is controversial. In what way is the controversy over causality? Why is there room for disagreement? What kind of evidence, data or theory, is available to support the various sides of the issue? Discuss the apparent strengths and weaknesses in the data or theory on either side? Is the available data observational or experimental? Is this relevant to the issue? What kind of data, if any, could resolve the issue? What obstacles are there to obtaining the ideal data to resolve the issue? Is better data likely to become available and how would it be helpful?
4) End the assignment with brief individual essays (identify the authors) stating your individual positions on the topic? Have you adopted a point of view? Describe the ways in which you remain uncertain and how your uncertainty could be resolved. If you wish you can write this part of the assignment as if it were a panel discussion among the members or your team. You could, in fact, record a panel discussion and transcribe it to text.

You are not expected to become experts in two weeks in the topic you choose. The goal of the assignment is for you to become informed lay persons with an understanding of the nature of the controversy and uncertainty concerning your topic, an understanding of the approaches that could resolve it and the challenges to achieving a resolution.

Plan to devote the equivalent of 4 to 8 pages of commentary, double-spaced, for the common part of the assignment and the equivalent of one page double-spaced on each individual essay.

Half of the grade (5 marks) is based on the common part of the assignment for which all members of the team receive the same grade. The remaining half of the grade (5 marks) is for individual essays. The grade is based on the quality of your research and the interest and intellectual energy you display in dealing with the problem.

At each deadline, you should submit the assignment as a single PDF file uploaded by a member of your team through NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792). The single file should include the common part of the assignment followed by the individual essays. Don't forget to identify the name of the author at the beginning of each essay.

Alternative Assignment 2

The first team that contacts me to tell me that they have unanimously decided to do the following alternative assignment will be allowed to do it instead of the assignment described above. Make sure to contact me to make sure that you are the first before undertaking this assignment.

This assignment involves doing some research on the factors that influence the success of teams in university courses and, most importantly, the design of a questionnaire and survey to be administered to our class to elicit information that could lead to the improvement of the approach we use for the implementation of teams.

Possible test/exam question

  1. Review these exercises for discussion in class (http://blackwell.math.yorku.ca/Files/NATS1500/NATS_1500_Intro_Exercises.pdf)
  2. Why does Hans Rosling say that students at the Karolinska Institute know statistically significantly less about the world than do chimpanzees -- and professors at the Karolinska Institute are roughly on a par with chimpanzees?
  3. Use an Agresti diagram to explain how conditional association between two variables can have a different sign than their unconditional association: You would be asked to do this with a specific example, e.g. quitting smoking and health, using sunscreen lotion and skin damage, working more hours per week and getting higher grades. In each case you want to identify a plausible confounding factor and draw the Agresti diagram conditioning on levels of the confounding factor.
  4. A study showed (this is true) that students who view the recorded videos of the lectures many times perform less well on the final exam than students who view the videos fewer times. Upon discovering this, your professor announces that he/she will discontinue recording the lectures because, the professor says, the videos have been shown to cause students to do perform more poorly on the course. Explain your point of view to the professor -- in clear and simple language even a professor might be able to understand.
    • Explain why the number of lectures attended could be a potential confounding factor in considering the relationship between the frequency of viewing class videos and performance on the course.
    • Can you think of potential mediating factors?
  5. In the 1964 U.S. Public Health Service study it was found that, for men and for women in each age group, current smokers were on average much healthier than the former people who had quit smoking.
    • Suppose that an important factor that explains this surprising fact is that people who quit smoking tend to experience increased stress and weight gain as a result of quitting, and that these factors adversely affect health. Are increased stress and weight gain potential mediating factors or confounding factors? Explain briefly.
    • Suppose that an important factor that explains this fact is that an important proportion of people who quit smoking do so because they are in poor health and have been strongly advised to quit smoking by their physicians. Would this be a mediating factor or a confounding factor in considering the relationship between between quitting smoking and health? Explain briefly.
    • In this study, are age and sex potential confounding factors or mediating factors?
  6. A study shows that heavy users of sunscreen lotion have a higher chance of developing skin cancer.
    • Does this imply that you should avoid using sunscreen lotion in order to reduce your chances of developing skin cancer?
    • Can you think of potential confounding factors and potential mediating factors?

Team Assignment 3

Assignment 3 consists in doing the accumulated problems from week to week that are assigned over the last six weeks of the course. It is due on March 25. The answers should be posted in the Moodle forum for Assignment 3 (https://moodle.yorku.ca/moodle/mod/assign/view.php?id=926760)

If we do this well, we will end up writing our own solution manual for the textbook.

Notes:

  1. Only the numbers in red need to be done for Assignment 3.
  2. The numbers shown in the text all have the form '5.x' where 'x' is the number of the question within chapter 5. In the following lists I only show 'x'.

Chapter 5

Looking for Patterns with Scatterplots:
1, 2, 3, 7
Describing Linear Pattern with a Regression Line:
11, 14
Measuring Strength and Direction with Correlation:
24 (important -- likely to be on exam), 27 (also a good candidate for the exam)
Why the Answers May Not Make Sense & Correlation Does Not Prove Causation:
36 (refers to 7), 39, 40,
Chapter Exercises:
46, 48, 49, 59, 60, 61, 62.

Chapter 6

Displaying Relationships Between Categorical Variables:
3, 4, 6, 7,
Risk, Relative Risk, Odds Ratio and Increased Risk & Misleading Statistics About Risk:
10 (nice exam question), 11-14 (ditto), 20 (refers to 6), 22
The Effect of a Third Variable and Simpson's Paradox:
27, 29, 31
Assessing the Statistical Significance of a 2 x 2 Table:
33, 34, 43
Chapter Exercises:
56,57, 58, 62.

Chapter 7

1. Random Circumstances & 2. Interpretations of Probability
2, 5, 7, 16
3. Probability Definitions and Relationships
18, 19, 20
4. Basic Rules for Finding Probabilities
34, 35, 36, 42
5. Strategies for Finding Complicated Probabilities
44, 45, 46, 47, 50, 54
6. Using Simulation to Estimate Probabilities
none
7. Coincidences and Intuitive Judgments about Probability
64, 68, 72, 76 (similar question likely to be on test)
Chapter Exercises
82, 83, 84, 85, 91 to 98 (sequence of exercises on same problem).

That's it!!

Project (Individual)

The general idea is to perform an analysis of some data that you find interesting using the statistical tools and critical insights that you have developed in the course. To help you find a topic and data you can have a look at Statistics: Pedagogical resources on this wiki. The whole project must be your own work although you may consult with other members of your team or other students.

  1. Identify a topic you find interesting about which you have a question that could be resolved with appropriate data and analysis.
  2. Find a number of sources (3 or more-- except in very special cases where 3 or more sources would not exist) that provide information relevant to your question. At least one source should have relevant data.
  3. Perform some analyses of the data including summaries of the distribution of relevant variable and relevant graphs.
  4. Based on a critical assessment of your sources and your analysis, discuss the implications for your question.
  5. Discuss clearly the strengths and limitations of your analysis and existing information in addressing your question.

Some guidelines for your report:

  1. Aim for a length of 8 to 12 pages double spaced of analyses and discussion plus at least 2 pages of relevant graphs.
  2. Show the results of at least one and preferably two analyses using a single data set -- unless you are very ambitious and want to use more.

Grading:

  1. Clear expression of specific question and relevant field: 10%
  2. Choice of sources and clear references: 10%
  3. Clarity and quality of argument: 20%
  4. Relevance and quality of analysis: 20%
  5. Relevance and quality of graphs: 20%
  6. Clear formal academic style of writing: 5%
  7. Effort: 5%
  8. Structure: 5%
  9. Overall appearance: 5%

You can get bonus marks in any category if you do an absolutely brilliant job in that category.

You should submit your project as a single PDF file through NATS 1500 Moodle site (https://moodle.yorku.ca/moodle/course/view.php?id=73792).

Interesting links

Some places to look for Statistics in the News

  • 538.com (http://fivethirtyeight.com/), a data journalism site founded by Nate Silver who is famous for his accurate predictions of U.S. elections.
  • Toronto Star (http://www.thestar.com/news.html)
    Many articles in daily newspapers use data or make claims that benefit from a critical analysis.
  • The Guardian (http://www.theguardian.com/us)
    The Guardian is one of two international newspapers that have a special reputation for the effective use and presentation of data. The other is:
  • The New York Times (http://www.nytimes.com/) (limit of 10 articles/month without a subscription)
  • New York Time Opinion Pages (http://www.nytimes.com/pages/opinion/index.html) (doesn't require a subscription)
  • New York Times online access through York (http://go.galegroup.com.ezproxy.library.yorku.ca/ps/infomark.do?serQuery=Locale%28en%2CUS%2C%29%3AFQE%3D%28jx%2CNone%2C16%29%22New+York+Times%22%24&queryType=PH&userGroupName=yorku_main&prodId=AONE&action=interpret&type=pubIssues&version=1.0&authCount=1&u=yorku_main)
  • This is Statistics (http://thisisstatistics.org/), a site created by the American Statistical Association.
  • Significance (http://onlinelibrary.wiley.com.ezproxy.library.yorku.ca/journal/10.1111/(ISSN)1740-9713) published by the Royal Statistical Society.
  • Stats Chat (http://www.statschat.org.nz/), a blog by Thomas Lumley
  • Statistical Modeling, Causal Inference and Social Science (http://andrewgelman.com/) a blog by Andrew Gelman. This one is quite advanced, even graduate students find it challenging.

A few interesting articles

Causality and Climate Change

Relevant Books