# R: Contextual variables

### From MathWiki

**THis page needs cleaning up**

With multilevel data, a contextual variable is a variable that is constant within groups and computed from the values of some other variable that may vary within within the group. An easy way to do this in R uses the 'tapply' function which is best done by first defining a 'capply' function:

> capply <- function( x, by, FUN, ...) tapply( x, by, FUN, ...) [ tapply( x, by) ]

Note that a more elaborate function has the advantage that FUN can return a vector which will be recycled to match the length of the input vector.

> capply <- function( x, by, FUN, ...) unsplit ( lapply ( split ( x , by ), FUN, ...), by )

Define a data frame:

> dd <- data.frame( x = 1:10, g = LETTERS[ rep( c(1,2,3), c(3,3,4)) ] )

Then

> dd$x.m <- capply( dd$x, dd$g, mean)

produces:

> dd x g x.m 1 1 A 2.0 2 2 A 2.0 3 3 A 2.0 4 4 B 5.0 5 5 B 5.0 6 6 B 5.0 7 7 C 8.5 8 8 C 8.5 9 9 C 8.5 10 10 C 8.5 >

For more complex problems in which a function depends on more than one variable in the data frame, we can use 'unsplit - lapply - split' as follows:

> dd$mad <- unsplit( lapply ( split( dd, dd$g) , function(z) with( z , median( abs(x - x.m )))))

In passing, it would be nice to improve this example so it would illustrate a situation in which the latter approach is really needed. [DONE: when you need FUN to return a vector]

## Links

- Why use contextual variables: http://www.math.yorku.ca/~georges/Slides/Workshop-v1-0Slides.pdf