R: Contextual variables

From MathWiki

THis page needs cleaning up

With multilevel data, a contextual variable is a variable that is constant within groups and computed from the values of some other variable that may vary within within the group. An easy way to do this in R uses the 'tapply' function which is best done by first defining a 'capply' function:


> capply <- function( x, by, FUN, ...) tapply( x, by, FUN, ...) [ tapply( x, by) ]

Note that a more elaborate function has the advantage that FUN can return a vector which will be recycled to match the length of the input vector.

> capply <- function( x, by, FUN, ...)  unsplit ( lapply ( split ( x , by ), FUN, ...), by )


Define a data frame:

> dd <- data.frame( x = 1:10, g = LETTERS[ rep( c(1,2,3), c(3,3,4)) ] )

Then

> dd$x.m <- capply( dd$x, dd$g, mean)

produces:

> dd
    x g x.m
1   1 A 2.0
2   2 A 2.0
3   3 A 2.0
4   4 B 5.0
5   5 B 5.0
6   6 B 5.0
7   7 C 8.5
8   8 C 8.5
9   9 C 8.5
10 10 C 8.5
> 

For more complex problems in which a function depends on more than one variable in the data frame, we can use 'unsplit - lapply - split' as follows:

> dd$mad <- unsplit( lapply (  split( dd, dd$g) ,  function(z) with( z , median( abs(x - x.m )))))

In passing, it would be nice to improve this example so it would illustrate a situation in which the latter approach is really needed. [DONE: when you need FUN to return a vector]

Links