Week 2: January 13,15
- Verzani: Using R for Introductory Statistics (http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
Draft
Using Rcmdr:
- Categorical variables:
- One way:
- Statistics | Summaries | Frequency distributions to get counts and percentages
- Two-way:
- Statistics | Contingency table to get counts or proportions and test of association
Graphs NOTE:
- Graphs | Pie chart
- Graphs | Barchart
NOTE:
- Index plot
- Histogram
- Stem-and-leaf display
- Boxplot
- Quantile-comparison plot
- Scatterplot
- Scatterplot matrix
- Line graph
* XY conditioning plot
- Plot of means
* Strip chart
- Bar graph
* Pie chart * 3D graph - 3D scatterplot | |- Identify observations with mouse | |- Save graph to file |- Save graph to file - as bitmap |- as PDF/Postscript/EPS |- 3D RGL graph
Graphs with Rcmdr
Purpose | Rcmdr menu | Notes |
---|---|---|
one categorical variable | Graphs | Bar graph Graphs | Pie chart | |
two categorical variables | need command line | library(lattice);with(Dataset, barchart( table( Xcat, Ycat), stack = F, auto.key=T) |
X cat var and Y num var | Graphs | Boxplot | click on Plot by groups |
one numeric variable | Graphs | Histogram Graphs | Boxplot | |
two numeric variables | Scatterplot | prompts for x and y variables |
X,Y num. vars & Z cat. var. | Graphs | Scatterplot | click on Plot by groups to choose Z |
one numeric variable | Graphs | Boxplot | |
one numeric variable | Graphs | Boxplot |
Statistics with Rcmdr
Purpose | Rcmdr menu | Notes |
---|---|---|
all variables | Data | Summaries | Active data sets | |
one categorical variable | Statistics | Summaries | Frequency distributions | |
two categorical variables | Statistics | Contingency tables | Two-way tables | Choose X as Row variable and Y as Column variable request multiple tables selecting No percentages and Column percentages |
X cat var and Y num var | Statistics | Means | One-way ANOVA Statistics | Summaries | Numerical summaries | X variable is groups |
one numeric variable | Statistics | Means | Single-sample t-test Statistics | Summaries | Numerical summaries | |
two numeric variables | Statistics | Fit models | |
X,Y num. vars & Z cat. var. | Statistics | Fit models |
Week 7, February 24
Material covered
Chapter 5 and the beginning of Chapter 6
Regression with two variables with Rcmdr
Purpose | Rcmdr menu | Notes |
---|---|---|
Explore num. vars. + possibly 1 cat. var. | Graphs | Scatterplot matrix Graphs | 3D Graphs | 3D scatterplot Statistics | Summaries | Numerical summaries | You can also include one categorical variable by selecting "Summarize by groups" or "Plot by groups" |
Scatterplot | Graphs | Scatterplot | |
Correlation | Statistics | Summaries | Correlation matrix | Use correlation test for p-values |
Fitting the least-squares line i.e. the estimated linear regression equation | Statistics | Fit models | Linear regression Models | Summarize models / Confidence intervals / Add observation statistics to data /etc. | After adding observation statistics to data you can plot residuals in various ways to whether there are patterns remaining in the residuals |
Assignment 2
Assignment 2 will done in the same groups as Assignment 1 except that groups that have become too small may be combined with others. Assignment 2 consists of the accumulated problems from week to week that are assigned over the next three weeks. The assignment is due on March 24.
Each current group should send me (mailto:georges+nats1500@yorku.ca) one email message giving me the name of the group and the names of its members. I'll address issues concerning reconstitution of groups on Sunday, Feb. 28.
Project
More details to come. The general idea is to perform an analysis of some data that you find of interest using the statistical tools and critical insights that you have developed in the course. To help you find a topic and data you can have a look at Statistics: Pedagogical resources on this wiki.
Exercises
Notes:
- Numbers in 'bold' need to be done for Assignment 2.
- The numbers shown in the text all have the form '5.x' where 'x' is the number of the question within chapter 5. In the following lists I only show 'x'.
Chapter 5, pp. 161--168:
- Looking for Patterns with Scatterplots:
- 1, 2, 3, 7
- Describing Linear Pattern with a Regression Line:
- 11, 14
- Measuring Strength and Direction with Correlation:
- 24 (important -- likely to be on exam), 27 (also a good candidate for the exam)
- Why the Answers May Not Make Sense & Correlation Does Not Prove Causation:
- 36 (refers to 7), 39, 40,
- Chapter Exercises:
- 46, 48,49. 59, 60, 61, 62.
Chapter 6, pp. 193--201:
- Displaying Relationships Between Categorical Variables:
- 3, 4, 6, 7,
- Risk, Relative Risk, Odds Ratio and Increased Risk & Misleading Statistics About Risk:
- 10 (nice exam question), 11-14 (ditto), 20 (refers to 6), 22
- The Effect of a Third Variable and Simpson's Paradox:
- 27, 29, 31
- Assessing the Statistical Significance of a 2 x 2 Table:
- 33, 34, 43
- Chapter Exercises:
- 56,57, 58, 62.
Readings for next week
Reread Chapter 6, read Chapter 7.