# VR4: Chapter 1 summary

The following commands have been adapted from Venables and Ripley (2002) Modern Applied Statistics with S, (4th ed.), and serve as a first introduction to the capabilities R.

After installing R, you can work through the following commands to get an overview of R's capabilities. In an R command, any text following a number sign (#) is a comment. The '>' is the standard prompt. To work through the following examples, type the commands (the text between the '>' prompt and the comment symbol '#') in the R console window. Try variations and experiment to see what happens. You can also cut and paste but you won't learn as much as you would by typing!

## Starting

You can open a general help page:

```> help.start()
```

## Arithmetic

R evaluates ordinary mathematical expressions and prints the results.

```> 2 + 3
[1] 5
> sqrt( 3/4)/( 1/3 - 2/pi^2)
[1] 6.6265
```

Experiment with the command above to find out whether in the expression

```> 2/pi^2
```

exponentiation (^2) takes place after division (/) or before.

The following command loads the MASS (Modern Applied Statistics with S) library that contains functions and data sets that accompany the text.

```> library(MASS)
```

Some libraries need to be installed when you're on the internet (you only need to do this once):

```> install.packages("car")
> install.packages("rgl")
```

and then they can be loaded when needed:

```> library(car)
> library(rgl)
```

```> ?help
```

## Generating and looking at random numbers

Generate 1,000 pairs of standard normal variates:

```> x <- rnorm(1000)
> y <- rnorm(1000)
```

Histogram of a mixture of normal distributions. Experiment with the number of bins and (25) and the shift(3) of the second component.

```> truehist( c( x, y +3 ), nbins = 25)
```

Note:

• c(x, y+3) combines the vector 'x' and the vector 'y+3' into a single vector.
• nbins: number of bars in the histogram

```> ?truehist
```

2D density plot:

```> contour ( dd <- kde2d(x,y) )
```

Greyscale or pseudo-color plot:

```> image(dd)
```

## Simple regression

```> x <- seq( 1, 20, 0.5)   # sequence from 1 to 20 by 0.5
> x                       # print it
> w <- 1 + x/2            # a weight vector
> y <- x + w*rnorm(x)     # generate y with standard deviations
#    equal to w

> dum <- data.frame( x, y, w )     # make a data frame of 3 columns
> dum                     # typing it's name shows the object
> rm( x, y, w )           # remove the original variables
```
```> fm <- lm ( y ~ x, data = dum)   # fit a simple linear regression (between x and y)
> summary( fm )                   # and look at the analysis
```
```> fm1 <- lm( y ~ x, data = dum, weight = 1/w^2 )  # a weighted regression
> summary(fm1)

> lrf <- loess( y ~ x, dum)    # a smooth regression curve using a
#   modern local regression function
> attach( dum )                # make the columns in the data frame visible
#   as variables (Note: before this command, typing x and
#   y would yield errors)
> plot( x, y)                  # a standard scatterplot to which we will
#   fit regression curves
> lines( spline( x, fitted (lrf)), col = 2)  # add in the local regression using spline
#  interpolation between calculated
#  points
> abline(0, 1, lty = 3, col = 3)    # add in the true regression line (lty=3: read line type=dotted;
#   col=3: read colour=green; type "?par" for details)
> abline(fm, col = 4)               # add in the unweighted regression line
> abline(fm1, lty = 4, col = 5)     # add in the weighted regression line
> plot( fitted(fm), resid(fm),
xlab = "Fitted Values",
ylab = "Residuals")          # a standard regression diagnostic plot
# to check for heteroscedasticity.
# The data were generated from a
# heteroscedastic process. Can you see
# it from this plot?
> qqnorm( resid(fm) )               # Normal scores to check for skewness,
#   kurtosis and outliers.  How would you
#   expect heteroscedasticity to show up?
> search()                          # Have a look at the search path
> detach()                          # Remove the data frame from the search path
> search()                          # Have another look. How did it change?
> rm(fm, fm1, lrf, dum)             # Clean up (this is not that important in R
#    for reasons we will understand later)
```

## Scottish hill races

```> data(hills)              # In R, data frames need to be loaded with the 'data' command.
> hills                    # List the data
> pairs(hills)             # A pairwise scatterplot

> attach(hills)            # Make the columns available by name
> plot(dist, time)         # Plot the points
> identify( dist, time, rownames(hills) )
# Use the left mouse button to identify outlying points
# and the right mouse button to quit. On a Macintosh
# click outside the plot to quit.
```
```> abline( lm( time ~ dist ))  # Show the least-squares regression line
> detach()                    # Clean up
```

## Effect of outliers

We can explore the effect of outliers on a linear regression by designing our own examples interactively. Try this several times.

```> plot( c(0,1), c(0,1), type = 'n')
> xy <- locator(type = "p")         # create your own data set by clicking on the
# left mouse button, then with the right mouse button
# to finish. With a Macintosh click outside the
# plot to finish.
> abline( lm( y ~ x, xy), col = 4)  # Fit a least-squares, a robust and a resistant
# regression line. Repeat to discover the effect of
# outliers, both vertically and horizontally
> abline( rlm( y ~ x, xy, method = "MM") , lty = 3, col = 3)
> abline( lqs( y ~ x, xy), lty = 2, col = 2)
> rm(xy)
```

## Michelson

We now look at data from the 1879 experiment of Michelson to measure the speed of light. There are five experiments (column Expt); each has 20 runs (column Run) and Speed is the recorded speed of light, in km/sec, less 299,000. (The currently accepted value on this scale is 724.5)

```> data(michelson)
> attach(michelson)
> search()
> plot( Expt, Speed, main = "Speed of Light Data",
xlab = "Experiment No.")
> fm <- aov( Speed ~ Run + Expt)                  # analyze as a randomized block design
# with runs and experiments as factors
> summary( fm )
> fm0 <- update( fm, . ~ . - Run)       # fit the sub-model omitting the nonsense factor Runs
# and compare using a formal analysis of variance.
> anova(fm0, fm)
```
```> detach()
> rm( fm, fm0)
```

## 3D visualization with rgl

```> library(rgl)
> rgl.open()
> rgl.bg(col='white')
> x <- sort(rnorm(1000))
> y <- rnorm(1000)
> z1 <- atan2(x,y)
> z <- rnorm(1000) + atan2(x, y)
> plot3d(x, y, z1, col = rainbow(1000), size = 2)
> bbox3d()
```