# R: Data description

Suggestions for improvements

• Discuss 'bystats' in 'Hmisc'
• Organize by type of description
• Summary stats by subgroup
• Stats on subgroups
• Tabulations of various types

In SPSS easy things are easy and difficult things are ...., In R everything is moderately difficult including simple things like data description.

Try:

```> library(car)
> data(Prestige)
> dd <- Prestige
> summary(dd)
> by ( dd, dd\$type, summary)
> attach( dd )
> by (dd, type , summary)

```

But alas, no sds.

```> by ( dd, type, sd, na.rm = T )
```

Boxplots by group:

```> library( lattice )
> boxplot( income ~ type, dd )
```

Test significant between group difference assuming appx. normality

```> anova( lm( income ~ type, dd ))
```

How to look at subgroups?

1. use > by( dd, dd\$type, 'function you want to use on the data frame' )

2. Select a 'sub data frame'

```> dd.bc <- dd[ dd\$type == 'bc',]
> dd.bc
> # analyze dd.bc
```
```> dd.prof <- dd[ dd\$type == 'prof', ]
> # etc.
```

3. For individual variables

```> tapply( dd\$income, dd\$type, mean, na.rm = T)   # shows 3 means
> tapply( dd\$income, dd%type, length )   # show Ns
```

e.g.

```> tapply( dd\$income, list(dd\$type,cut2(dd\$education)), length)
[6.38,16.0]
bc            44
prof          31
wc            23
```
```> tapply( dd\$income, list(dd\$type,cut2(dd\$education,g=3)), length)
[ 6.38, 9.05) [ 9.05,11.59) [11.59,15.97]
bc              33            11            NA
prof            NA             2            29
wc              NA            18             5
> atotal(tapply( dd\$income, list(dd\$type,cut2(dd\$education,g=3)), length))
[ 6.38, 9.05) [ 9.05,11.59) [11.59,15.97] Total
bc               33            11            NA    NA
prof             NA             2            29    NA
wc               NA            18             5    NA
Total            NA            31            NA    NA
```