R: C code: including it in R

From MathWiki

Table of contents

Some relevant links

It seems that including a routine written in C is quite easy under UNIX (see the second item below) but is somewhat more challenging under Windows.

  • The R FAQ (http://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-do-I-include-compiled-C-code_003f) has some information and a reference to the R Installation and Administration Manual (http://cran.r-project.org/doc/manuals/R-admin.html). The information provided seems very detailed but might go beyond what is necessary for basic use in an algorithm under UNIX.
  • An article by Farnsworth (http://www.google.com/url?sa=t&ct=res&cd=3&url=http%3A%2F%2Fcran.r-project.org%2Fdoc%2Fcontrib%2FFarnsworth-EconometricsInR.pdf&ei=5ycVSMLCPIXAggKJybCtAg&usg=AFQjCNH11xYyIbQnf0jbS1T3x5q9hvx0nA&sig2=qHwmiO0XOXj-XTSDxd-ltQ), excerpted below gives a fairly direct approach that can be used directly under UNIX. Under Windows, I presume, one would have to install the tools referred to in the previous item. If anyone tries it out, please report on your experiences so we can have as simple an explanation as possible that nevertheless works.

9.9 Calling C functions from R

Some programming problems have elements that are just not made for an interpreted language because they require too much computing power (especially if they require too many loops). These functions can be written in C, compiled, and then called from within R4. R uses the system compiler (if you have one) to create a shared library (ending in .so or .dll, depending on your system) which can then be loaded using the dyn.load() function.

9.9.1 How to Write the C Code

A function that will be called from within R should have type void (it should not return anything except through its arguments). Values are passed to and from R by reference, so all arguments are pointers. Real numbers (or vectors) are passed as type double*, integers and boolean as type int*, and strings as type char**. If inputs to the function are vectors, their length should be passed manually. Also note that objects such as matrices and arrays are just vectors with associated dimension attributes. When passed to C, only the vector is passed, so the dimensions should be passed manually and the matrix recreated in C if necessary.

Here is an example of a C file to compute the dot product of two vectors

void gdot(double *x,double *y,int *n,double *output){
   int i;
   *output=0;
   for (i=0;i<*n;i++){
     *output+=x[i]*y[i];
   }
}

No header files need to be included unless more advanced functionality is required, such as passing complex numbers (which are passed as a particular C structure). In that case, include the file R.h. Do not use malloc() or free() in C code to be used with R. Instead use the R functions Calloc() and Free(). R does its own memory management, and mixing it with the default C memory stuff is a bad idea. Outputting from inside C code should be done using Rprintf(), warning(), or error(). These functions have the same syntax as the regular C command printf(), which should not be used. It should also be noted that long computations in compiled code cannot be interrupted by the user. In order to check for an interrupt signal from the user, we include

#include <R_ext/Utils.h>
...
R_CheckUserInterrupt();

in appropriate places in the code. My experience has been that the speedup of coding in C is not enough to warrant the extra programming time except for extremely demanding problems. If possible, I suggest working directly in R. It’s quite fast—as interpreted languages go.

9.9.2 How to Use the Compiled Functions

To compile the library, from the command line (not inside of R) use the command

R CMD SHLIB mycode.c

This will generate a shared library called mycode.so. To call a function from this library we load the library using dyn.load() and then call the function using the .C() command. This command makes a copy of each of its arguments and passes them all by reference to C, then returns them as a list. For example, to call the dot product function above, we could use

> x<-c(1,4,6,2)
> y<-c(3,2.4,1,9)
> dyn.load("mycode.so")
> product<-.C("gdot",myx=as.double(x),myy=as.double(y),myn=as.integer(NROW(x)),myoutput=numeric())
> product$myoutput
[1] 36.6

Notice that when .C() was called, names were given to the arguments only for convenience (so the resulting list would have names too). The names are not passed to C. It is good practice (and often necessary) to use as.double() or as.integer() around each parameter passed to .C(). If compiled code does not work or works incorrectly, this should be checked first.

It is important to create any vectors from within R that will be passed to .C() before calling them. If the data being passed to .C() is large and making a copy for passing is not desirable, we can instruct .C() to edit the data in place by passing the parameter DUP=FALSE. The programmer should be very wary when doing this, because any variable changed in the C code will be changed in R also and there are subtle caveats associated with this. The help file for .C() or online documentation give more information.

There is also a .Fortran() function. Notice that .C() and .Fortran() are the simple ways to call functions from these languages, they do not handle NA values or complicated R objects. A more flexible and powerful way of calling compiled functions is .Call(), which handles many more types of R objects but adds significantly to the complexity of the programming. The .Call() function is a relatively recent addition to R, so most of the language was written using the simple but inflexible .C().

9.10 Calling R Functions from C

Compiled C code that is called from R can also call certain R functions (fortran can not). In particular, the functions relating to drawing from and evaluating statistical distributions are available. To access these functions, the header file Rmath.h must be included. Unfortunately these C functions are not well documented, so the programmer may have to look up their definitions in Rmath.h on the local system. Before calling these functions, GetRNGstate() must be called, and PutRNGstate() must be called afterward. Below is a C function that generates an AR(1) series with N(0,1) errors and a supplied coefficient.

#include<Rmath.h>
void ar1(double *y,double *rho,double *N){
  int i;
  GetRNGstate();
  for (i=1;i<N[0];i++){
     y[i]=rho[0]*y[i-1]+rnorm(0.0,1.0);
  }
     PutRNGstate();
  }

which could be called (as usual) from within R using

> dyn.load("ar1.so")
> X<-.C("ar1",x=double(len=5000),rho=as.double(.9),n=as.integer(5000))$x

Most common mathematical operations, such as sqrt() are also available through the C interface. Actual R expressions can also be called from within C, but this is not recommended since it invokes a new instance of R and is slower than terminating the C code, doing a computation in R, and calling another C function. The method for doing it is the (now depreciated) call R() function.