R: Reshaping data

From MathWiki

This page or part of it is incomplete. Please help by contributing to it.

For discussions and approaches on reshaping data in R see:

  1. Hadley Wickham (2005) Reshaping Data in R. 'Statistical Computing and Graphics (http://www.amstat-online.org/sections/graphics/newsletter/Volumes/v162.pdf) also: http://had.co.nz/cast

There are many situations in which one needs to 'reshape' data. Examples are:

  1. 'wide' to 'long' conversions (and vice-versa) in longitudinal data analysis
  2. 'wide' to 'long' conversion to plot multiple response variables using xyplot (although this can now be done with more than one variable on the left side of '~' in the plotting formula)
  3. combining data or adjusted data and predicted values for plotting in the same plot

R has a number of functions that can help:

  1. merge
  2. reshape
  3. cbind and rbind
  4. stack
  5. aperm

This page is intended to show ways of reshaping data organized by task.

Table of contents

Adding new variables

Suppose 'old' is a data frame to which you wish to add some new variables contained in 'new'. 'old' and 'new' should have at least one id variable, say 'ID'. The two data frames are combined with

> comb <- merge( old, new, by = 'ID', all.x = T)  # note that this omits ID's in 'new' that are not in 'old'

Plotting multiple responses

'wide' to 'long' conversion


dw = data.frame( ID = c('a','b','c'), T.1 =c(1,6,NA), T.2 = c(2,7,8), T.3 = c(8,10,NA),
       Y.1 = c(11,16,NA), Y.2 = c(12,17,18), Y.3 = c(NA, 18, NA), W = c(11,12,13))
# wide to long
dl = reshape( dw, direction = "long",  idvar ="anything",
       varying =  list( "T" = c("T.1","T.2","T.3"), Y=c("Y.1","Y.2","Y.3")),
       v.names = c('T','Y'))
# We can also achieve the same thing more easily with
dl <- reshape( dw, direction = "long", varying = 2:7)