R: Data conversion from SPSS

From MathWiki

Table of contents

Conversion with SPSS 15

From SPSS to R

The following steps seems to work with 'Date' and 'String' variables.

1) Save the SPSS file as Stata (.dta) file (I've used "Stata Version 8 SE")
2) Read it into R with the following code:
 Read.dta <- function ( ... ) {
    help <- "
    Read.dta reads Stata files using 'read.dta' in 'library(foreign)'
    This appears to be an ideal way of importing spss files in order
    to keep full variable names. Direct use of 'read.spss' on a SPSS
    '.sav' file abbreviates variable names to 8 characters.
    Note: missing.type = T produces warnings.
    "
          require("foreign")
          trim <- function( x ) x[] <- sub(" +$", "", x )
          #  dd <- read.dta(... , missing.type = T)  # Note: missing.type = T produces warnings.
          dd <- read.dta(...)
          cls <- sapply(dd,class)
          ch.nams <- names(dd) [ cls == "character" ]
          for ( nn in ch.nams ) ddnn <- factor(trim(ddnn) )
          dd
  }
Note that by default 'read.dta' turns strings into long character strings with trailing blanks.

From R to SPSS

1) Transform dates and factors to character
2) Use write.dta
3) Read into SPSS
4) Manually convert date variables to Dates using the format yyyy-mm-dd
5) Save as a .sav file.

Possibly the best way to convert to and from SPSS

Version 14 of SPSS (and perhaps earlier versions as well) can create Stata (.dta) file which can be read in R with 'read.dta' in 'library(foreign)'. This may be the ideal way to transfer data.

Transferring data from SPSS to R

Method Variable names Factors Other limitations
Stata (.dta) using read.dta

in library(foreign)

Handles long names Turns to characters and uses fixed width

-can be trimmed and turned to factors with 'trim' below or use 'Read.dta' in 'coursefun.R'

None yet known
SPSS (.sav) using spss.get

in library(Hmisc)

Abbreviates names to 8 char. ok
.xls file then

convert to .csv in Excel and use read.csv

Handles long names ok Using file in Excel truncates to 256 variables

Method for trimming factors and converting character vectors to factors

trim <- function(x) UseMethod("trim")
trim.data.frame <- function(x) {
    for ( nn  in names(x)) x[[nn]] <- trim(x[[nn]])
    x
}
trim.factor <- function( x ) {
    levels(x) <- sub(" +$", "", levels(x))
    x
}
trim.character <- function( x ) {
    trim(factor(x))
}
trim.default <- function(x) x

Transferring data from R to SPSS

The function 'write.dta' in library(foreign) produces a file that SPSS can read. To see factor values in SPSS, click on View|Factor Labels.

Converting data from SPSS to R

See R:_Local_tutorial#SPSS for an introductory discussion.

Instead of using 'read.spss' in 'library(foreign)' directly, you may prefer to use 'spss.get' in 'library(Hmisc)'. It uses 'read.spss' with better defaults and does some post-processing. For example, date variables are converted to R dates. Interestingly, from Rnews (2004) volume 4/1 (http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf):

SPSS uses October 14, 1582 as the origin thereby
representing datetimes as seconds since the beginning
of the Gregorian calendar. SAS uses seconds
since January 1, 1960. spss.get and sas.get in package
Hmisc can handle such datetimes automatically
(Alzola and Harrell, 2002).

An outstanding problem is the conversion of long variable names in recent versions of SPSS. 'read.spss' seems designed to use variable names that are no longer than 8 characters, a constraint in older versions of SPSS. 'read.spss' converts long SPSS variable names to an 8-character representation. If you want to use the longer SPSS name, I don't know of an alternative but to manually change the variable name in R.


Converting dates manually

The following is handled in 'spss.get' in 'library(Hmisc)':

From http://tolstoy.newcastle.edu.au/R/help/06/02/21455.html

Re: [R] How to convert SPSS date data to dates?

From: Chuck Cleland <ccleland_at_optonline.net>
Date: Fri 17 Feb 2006 - 04:49:48 EST

Here is one way:

library(chron)

as.chron(ISOdate(1582, 10, 14) + mydata$SPSSDATE)

Jonathan Williams wrote: