# VR4: Chapter 1 summary

### From MathWiki

The following commands have been adapted from Venables and Ripley (2002) *Modern Applied Statistics with S,* (4th ed.), and serve as a first introduction to the capabilities R.

After installing R, you can work through the following commands to get an overview of R's capabilities. In an R command, any text following a number sign (#) is a comment. The '>' is the standard prompt. To work through the following examples, type the commands (the text between the '>' prompt and the comment symbol '#') in the R console window. Try variations and experiment to see what happens. You can also cut and paste but you won't learn as much as you would by typing!

Table of contents |

## Starting

You can open a general help page:

> help.start()

## Arithmetic

R evaluates ordinary mathematical expressions and prints the results.

> 2 + 3 [1] 5 > sqrt( 3/4)/( 1/3 - 2/pi^2) [1] 6.6265

Experiment with the command above to find out whether in the expression

> 2/pi^2

exponentiation (^2) takes place after division (/) or before.

## Loading a library

The following command loads the MASS (Modern Applied Statistics with S) library that contains functions and data sets that accompany the text.

> library(MASS)

Some libraries need to be installed when you're on the internet (you only need to do this once):

> install.packages("car") > install.packages("rgl")

and then they can be loaded when needed:

> library(car) > library(rgl)

Read the help page about how to use help:

> ?help

## Generating and looking at random numbers

Generate 1,000 pairs of standard normal variates:

> x <- rnorm(1000) > y <- rnorm(1000)

Histogram of a mixture of normal distributions. Experiment with the number of bins and (25) and the shift(3) of the second component.

> truehist( c( x, y +3 ), nbins = 25)

Note:

- c(x, y+3) combines the vector 'x' and the vector 'y+3' into a single vector.
- nbins: number of bars in the histogram

Read about optional arguments:

> ?truehist

2D density plot:

> contour ( dd <- kde2d(x,y) )

Greyscale or pseudo-color plot:

> image(dd)

## Simple regression

> x <- seq( 1, 20, 0.5) # sequence from 1 to 20 by 0.5 > x # print it > w <- 1 + x/2 # a weight vector > y <- x + w*rnorm(x) # generate y with standard deviations # equal to w > dum <- data.frame( x, y, w ) # make a data frame of 3 columns > dum # typing it's name shows the object > rm( x, y, w ) # remove the original variables

> fm <- lm ( y ~ x, data = dum) # fit a simple linear regression (between x and y) > summary( fm ) # and look at the analysis

> fm1 <- lm( y ~ x, data = dum, weight = 1/w^2 ) # a weighted regression > summary(fm1) > lrf <- loess( y ~ x, dum) # a smooth regression curve using a # modern local regression function > attach( dum ) # make the columns in the data frame visible # as variables (Note: before this command, typing x and # y would yield errors) > plot( x, y) # a standard scatterplot to which we will # fit regression curves > lines( spline( x, fitted (lrf)), col = 2) # add in the local regression using spline # interpolation between calculated # points > abline(0, 1, lty = 3, col = 3) # add in the true regression line (lty=3: read line type=dotted; # col=3: read colour=green; type "?par" for details) > abline(fm, col = 4) # add in the unweighted regression line > abline(fm1, lty = 4, col = 5) # add in the weighted regression line > plot( fitted(fm), resid(fm), xlab = "Fitted Values", ylab = "Residuals") # a standard regression diagnostic plot # to check for heteroscedasticity. # The data were generated from a # heteroscedastic process. Can you see # it from this plot? > qqnorm( resid(fm) ) # Normal scores to check for skewness, # kurtosis and outliers. How would you # expect heteroscedasticity to show up? > search() # Have a look at the search path > detach() # Remove the data frame from the search path > search() # Have another look. How did it change? > rm(fm, fm1, lrf, dum) # Clean up (this is not that important in R # for reasons we will understand later)

Comments by Team Adler

## Scottish hill races

> data(hills) # In R, data frames need to be loaded with the 'data' command. > hills # List the data > pairs(hills) # A pairwise scatterplot > attach(hills) # Make the columns available by name > plot(dist, time) # Plot the points > identify( dist, time, rownames(hills) ) # Use the left mouse button to identify outlying points # and the right mouse button to quit. On a Macintosh # click outside the plot to quit.

> abline( lm( time ~ dist )) # Show the least-squares regression line > detach() # Clean up

- Comments by Team Allport.

## Effect of outliers

We can explore the effect of outliers on a linear regression by designing our own examples interactively. Try this several times.

> plot( c(0,1), c(0,1), type = 'n') > xy <- locator(type = "p") # create your own data set by clicking on the # left mouse button, then with the right mouse button # to finish. With a Macintosh click outside the # plot to finish. > abline( lm( y ~ x, xy), col = 4) # Fit a least-squares, a robust and a resistant # regression line. Repeat to discover the effect of # outliers, both vertically and horizontally > abline( rlm( y ~ x, xy, method = "MM") , lty = 3, col = 3) > abline( lqs( y ~ x, xy), lty = 2, col = 2) > rm(xy)

## Michelson

We now look at data from the 1879 experiment of Michelson to measure the speed of light. There are five experiments (column Expt); each has 20 runs (column Run) and Speed is the recorded speed of light, in km/sec, less 299,000. (The currently accepted value on this scale is 724.5)

> data(michelson) > attach(michelson) > search() > plot( Expt, Speed, main = "Speed of Light Data", xlab = "Experiment No.") > fm <- aov( Speed ~ Run + Expt) # analyze as a randomized block design # with runs and experiments as factors > summary( fm ) > fm0 <- update( fm, . ~ . - Run) # fit the sub-model omitting the nonsense factor Runs # and compare using a formal analysis of variance. > anova(fm0, fm)

> detach() > rm( fm, fm0)

- Comments by Team Argyle

## 3D visualization with rgl

> library(rgl) > rgl.open() > rgl.bg(col='white') > x <- sort(rnorm(1000)) > y <- rnorm(1000) > z1 <- atan2(x,y) > z <- rnorm(1000) + atan2(x, y) > plot3d(x, y, z1, col = rainbow(1000), size = 2) > bbox3d()