R: Data description

From MathWiki

Suggestions for improvements

  • Discuss 'bystats' in 'Hmisc'
  • Organize by type of description
    • Summary stats by subgroup
    • Stats on subgroups
    • Tabulations of various types

In SPSS easy things are easy and difficult things are ...., In R everything is moderately difficult including simple things like data description.


Try:

> library(car)
> data(Prestige)
> dd <- Prestige
> summary(dd)
> by ( dd, dd$type, summary)
> attach( dd )
> by (dd, type , summary)

But alas, no sds.

> by ( dd, type, sd, na.rm = T )

Boxplots by group:

> library( lattice )
> boxplot( income ~ type, dd )

Test significant between group difference assuming appx. normality

> anova( lm( income ~ type, dd ))

How to look at subgroups?

1. use > by( dd, dd$type, 'function you want to use on the data frame' )

2. Select a 'sub data frame'

> dd.bc <- dd[ dd$type == 'bc',]
> dd.bc
> # analyze dd.bc
> dd.prof <- dd[ dd$type == 'prof', ]
> # etc.

3. For individual variables

> tapply( dd$income, dd$type, mean, na.rm = T)   # shows 3 means
> tapply( dd$income, dd%type, length )   # show Ns

e.g.

> tapply( dd$income, list(dd$type,cut2(dd$education)), length)
     [6.38,16.0]
bc            44
prof          31
wc            23
> tapply( dd$income, list(dd$type,cut2(dd$education,g=3)), length)
     [ 6.38, 9.05) [ 9.05,11.59) [11.59,15.97]
bc              33            11            NA
prof            NA             2            29
wc              NA            18             5
> atotal(tapply( dd$income, list(dd$type,cut2(dd$education,g=3)), length))
      [ 6.38, 9.05) [ 9.05,11.59) [11.59,15.97] Total
bc               33            11            NA    NA
prof             NA             2            29    NA
wc               NA            18             5    NA
Total            NA            31            NA    NA