[1] -0.06047565 0.26982251 2.05870831 0.57050839 0.62928774 2.21506499
[7] 0.96091621 -0.76506123 -0.18685285 0.05433803
CRusty R
Calling Rust from R using C
Why would you only want to write one language when you can write all the languages. In this short post I’ll briefly explain how, with the help of C, you can write Rust functions that you can call from R.
My dearly departed colleague Milan had his going away party last week. As we were walking through the streets of Brighton on a Friday night we got chatting about Rust (and Python and R and other languages). I said that I really liked the idea of just writing a library once and then being able to call it from any language you might want to use. In particular, I said I liked the idea of leveraging the power of Rust to write high performance R packages. Milan wondered how it was possible to get Rust and R to talk to each other. So, in this post, I’ll explain how!
Some Rust Code
Before we can get R and Rust to talk to each other, we’ll need some Rust code for R to call. I’m going to do a Rust re-write of the code in my first post of my functional style code in R series. In that post I showed how you could take a list of numbers and then use Welfords algorithm to work out the running mean and standard deviation (and any other stats you might want that are based on these values, such as the standard error, the effect size, or the t value). Additionally, for the Rust re-write, I’ll also want to pass in a set size. The set size value will chunk the input into n sets of size set size.
To make it more concrete, I’ll want something that does the following. Let’s say that I give the function something like the following input:
And I specify a set size of 5, then it should return something like the following:
Code
<- 5
set_size
<- function(x, new_value) {
welfords <- x[1]
count <- x[2]
mean <- x[3]
squared_distances <- count + 1
count <- new_value - mean
delta <- mean + (delta / count)
new_mean <- new_value - new_mean
delta2 <- squared_distances + (delta * delta2)
squared_distances <- sqrt(squared_distances / (count - 1))
sd <- sd / sqrt(count)
se <- new_mean / se
t <- new_mean / sd
d c(count, new_mean, squared_distances, sd, se, d, t)
}
<- function(x, col_names) {
list_vec_to_dataframe <- length(x)
n_cols |>
x unlist() |>
matrix(ncol = n_cols) |>
t() |>
as.data.frame() |>
setNames(col_names)
}
<- function(tib) {
welfords_accumulate ::accumulate(tib$value, welfords, .init = c(0, 0, 0, 0, 0, 0, 0)) |>
purrrlist_vec_to_dataframe(c("i", "mean", "sq_dist", "sd", "se", "d", "t")) |>
::select(-sq_dist) |>
dplyr::mutate(group = unique(tib$group)) # add back group label
dplyr
}
::tibble(
tibblegroup = rep(1:(length(input) / set_size), each = set_size),
value = input,
|>
) ::group_by(group) |>
dplyr::group_split() |>
dplyr::future_map(welfords_accumulate) |>
furrr::reduce(\(x, y) rbind(x, y)) |>
purrr::select(group, i, mean, sd, se, d, t) |>
dplyr::set_colnames(c("group", "n", "mean", "sd", "se", "d", "t")) |>
magrittr::filter(n != 0) dplyr
group n mean sd se d t
1 1 1 -0.06047565 NaN NaN NaN NaN
2 1 2 0.10467343 0.2335561 0.1651491 0.4481726 0.6338118
3 1 3 0.75601839 1.1401864 0.6582869 0.6630656 1.1484633
4 1 4 0.70964089 0.9355676 0.4677838 0.7585137 1.5170274
5 1 5 0.69357026 0.8110218 0.3627000 0.8551807 1.9122423
6 2 1 2.21506499 NaN NaN NaN NaN
7 2 2 1.58799060 0.8868171 0.6270744 1.7906630 2.5323799
8 2 3 0.80363999 1.4962754 0.8638750 0.5370936 0.9302735
9 2 4 0.55601678 1.3182674 0.6591337 0.4217784 0.8435569
10 2 5 0.45568103 1.1634896 0.5203284 0.3916503 0.8757567
So let’s write some Rust code.
Setting up the project
First, we’ll start a new cargo project. We’ll be doing it with the --lib
flag, because we’re creating a library.
cargo init --lib --name welfords
We’ll want to add a few dependencies to our Cargo.toml
. So we can go ahead and add them now. I want to speed things up by running computations in parallel, so I’ll use the amazing rayon
crate. And then I need the libc
crate so that I have access to the c
types.
cargo add rayon
cargo add libc
Finally, I’ll just edit the Cargo.toml
file to specify that I want to create a static library, and the name of this library. Adding the following lines to the bottom of the file will do the trick.
[lib]
name="welfords"
crate-type=["staticlib"]
Our first Rust function
The first Rust function we’ll write is one that will actually implement Welfords algorithm.
It’s going to look something like the following:
fn welfords(
, count, mean, squared_distances): &mut (
(groupusize,
f64,
f64,
f64,
,
): &f64,
new_value-> Option<Welfords> {
) *count += 1.;
let delta = new_value - *mean;
*mean += delta / *count;
let delta2 = new_value - *mean;
*squared_distances += delta * delta2;
let sd = (*squared_distances / (*count - 1.)).sqrt();
let se = sd / count.sqrt();
let t = *mean / se;
let d = *mean / sd;
Some(Welfords::new(
*group, *count, *mean,
, se, d, t,
sd
))}
This might look pretty daunting if you’re new to Rust, but if you compare it to the R function above you’ll see that they’re pretty similar. As with the R version, it is a binary function (takes two arguments). We need this, because we’re going to use it in an accumulate_, and accumulators need binary functions. The first argument will be the current state, and the second will be the new value. The key difference between the Rust version and the R version is that in the R version the first argument was an R list
. We then indexed the first argument to get the running count, running mean, and running squared distances. For the Rust version, the first argument will be a tuple. These will be mutable references because we’re using them to keep track of our ever-changing current state. The Rust version will also take the current group as one of the values of this tuple. This is because Rust has an enumerated accumulate function, but R does not.
Finally, there’s the return type. Because Rust allows us to keep track of the values with references to values in memory (rather than by creating new values), we can also return only the values that we want in the final result. Therefore, the return type will just be a Option<Welfords>
, which I’ve just defined as follows:
struct Welfords {
: usize,
group: f64,
count: f64,
mean: f64,
sd: f64,
se: f64,
d: f64,
t}
Our second Rust function
The next function that we need is a function that will take our input vector of numbers and the set size, and then output the data that will form the rows of our R data.frame
. That is, we want a function that will output a vector of Welfords
. The signature would look like the following:
fn run_welfords(input_vec: Vec<f64>, set_size: usize) -> Vec<Welfords> {
}
For the body, we’re actually only going to have one line. So before we write it, let’s break down what we need to do:
- Take our input vector and turn it into a parallel iterator, so we can iterate in parallel.
- Next, we’re going to chunk it into chunks of set size
- We’ll then enumerate each of those chunks (our group number)
- Next, we’ll map over these chunks.
Inside of each map, we’ll do the following
- Turn the data in that chunk into an iterator.
- Apply the
welfords
function to that iterator using ascan
(which is what many programming languages, including Rust, callaccumulate
)
We’ll also have to make sure that we collect the result of our scan
into a vector, and that we flatten the chunks at the end and collect the result into a vector. The full function would look like this:
use rayon::prelude::*;
fn run_welfords(input_vec: Vec<f64>, set_size: usize) -> Vec<Welfords> {
input_vec.into_par_iter()
.chunks(set_size as usize)
.enumerate()
.map(|(i, group)| {
group.iter()
.scan((i + 1, 0., 0., 0.), welfords)
.collect::<Vec<Welfords>>()
})
.flatten()
.collect()
}
So far so good. Conceptually, we’re doing the same as in the R version. And if you’ve got a good understanding of the R code then it shouldn’t be too difficult to parse the Rust version.
But so far we’ve got nothing that can communicate with R. For this, we’ll need our final Rust function. This is the one that will actually be doing the communication with R. Well, it’ll be speaking to R through a C translator, but we’ll get to that. For this final function we’re going to have to write some special unsafe code. Going unsafe means that we’re going to ignore any warnings that the Rust compiler might want to give us about potentially memory unsafe code we might be writing. But trust me, it’s going to be OK!
Going Unsafe
Our final Rust function is going to be the one that communicates with C, so its function signature is going to look a bit different. Let’s take a look at it first, and then we’ll unpack it.
use libc::{c_double, c_int};
#[no_mangle]
pub extern "C" fn welfords_wrapper(
: *const c_double,
input: c_int,
length: c_int,
set_size: *mut c_int,
group: *mut c_int,
n: *mut c_double,
mean: *mut c_double,
sd: *mut c_double,
se: *mut c_double,
d: *mut c_double,
t{} )
A couple of things to notice about this function signature. First, all the inputs are C
types. Second, we’re mostly passing in pointers to bits of memory. These are the bits of memory that will be holding our data on the C side of things. Third, the function signature starts with:
#[no_mangle]
pub extern "C"
This is just how we expose this function to C.
But probably the most important thing is that this function doesn’t return anything. The reason we don’t return anything is that we’re instead passing in the addresses to the bits of memory that will hold our output, and we’re just going to write to these bits of memory directly. So the signature includes inputs that will tell us were to put the values for the group column, the n column, the mean column, etc.
With the signature out of the way, let’s get to the function body. The first thing we’ll do is declare a Rust Vec<_>
that will actually hold our data. Knowing were in memory our data is located is good, but our run_welfords()
function actually needs a vector of data, so that’s what we need to create. Once we’ve created this vector, we’ll go to the bits of memory specified by input
and actually grab the values. Because this is potentially unsafe, we’ll need to wrap it in an unsafe
block.
let mut input_vec: Vec<f64> = vec![0f64; length as usize];
for i in 0..(length as usize) {
unsafe {
= *(input.offset(i as isize)) as f64; // grab the value from memory
input_vec[i] }
}
Now that we have an Vec<_>
of values we can just give it to our run_welfords()
function:
let results_vec: Vec<Welfords> = run_welfords(input_vec, set_size as usize);
So far so good. The variable results_vec
now holds our data that will make up the rows of our data frame. But we can’t just return this data. Instead, we’ll have to write the values in this vector to the appropriate bits of memory that are specified by the group
, n
, mean
etc inputs. This means writing some unsafe code again. We’ll just iterate over our Vec<_>
and write the values to memory.
for (i, el) in results_vec.iter().enumerate() {
unsafe {
*group.add(i) = el.group as i32;
*n.add(i) = el.count as i32;
*mean.add(i) = el.mean;
*sd.add(i) = el.sd;
*se.add(i) = el.se;
*d.add(i) = el.d;
*t.add(i) = el.t;
}
}
Great, we’ve finished the Rust side of things! Now it’s time to write some C.
Some C Code
C is one of those things that feels like a necessary evil. I don’t like writing C, but there are some things that you can only really do in C. And getting different programming languages to talk to each other is one of those things. Rust doesn’t speak R, and R doesn’t speak Rust, but they both speak C. C is the “lingua franca” of programming languages.
Let’s start by writing our first C function. We’ll just put all our C code in a file called welfords.c
, and this will live in a different folder to our Rust code.
We’ll start off by bringing in the R api:
#include <R.h>
#include <Rinternals.h>
Next, we’ll include a call to our Rust function (welfords_wrapper()
):
extern void welfords_wrapper(double *input,
int length,
int set_size,
int *group,
int *n,
double *mean,
double *sd,
double *se,
double *d,
double *t);
It’s an external function that’s not returning anything, and we’re passing in just what we’d expect from the Rust side of things.
The final step is to write the C function that will take the inputs from R and pass it to our Rust function we’ve defined above. For this function, we’re going to be working with R types. All R types are S-expressions, which I guess tells you something about R’s Lispy roots! This means that our function is going to have to take in S-expressions and output S-expressions. This gives us a function signature that looks as follows:
(SEXP input, SEXP set_size) {} SEXP welfords
The next step is to count the number of elements in our input vector (the length input to our Rust function) and then declare all the memory we’ll need. All these declarations are wrapped in PROTECT
to prevent the R garbage collector from messing with them.
int elements = length(input);
// allocate space for the results
= PROTECT(allocVector(INTSXP, elements));
SEXP group = PROTECT(allocVector(INTSXP, elements));
SEXP n = PROTECT(allocVector(REALSXP, elements));
SEXP mean = PROTECT(allocVector(REALSXP, elements));
SEXP sd = PROTECT(allocVector(REALSXP, elements));
SEXP se = PROTECT(allocVector(REALSXP, elements));
SEXP d = PROTECT(allocVector(REALSXP, elements)); SEXP t
Next, we can call our Rust function. We’ll have to wrap the SEXP values in some functions to turn them into types that our imported Rust function will understand.
(REAL(input),
welfords_wrapper,
elements(set_size),
asInteger(group),
INTEGER(n),
INTEGER(mean),
REAL(sd),
REAL(se),
REAL(d),
REAL(t)); REAL
And that’s it for getting the result! But before we return anything we’re going to want to turn it into an R data.frame
. A data.frame
is just a list
of vectors, so we’ll declare a new list and then put our vectors in there.
= PROTECT(allocVector(VECSXP, 7));
SEXP final_result // put data into list
(final_result, 0, group);
SET_VECTOR_ELT(final_result, 1, n);
SET_VECTOR_ELT(final_result, 2, mean);
SET_VECTOR_ELT(final_result, 3, sd);
SET_VECTOR_ELT(final_result, 4, se);
SET_VECTOR_ELT(final_result, 5, d);
SET_VECTOR_ELT(final_result, 6, t); SET_VECTOR_ELT
What makes a list
a data.frame
and not just a regular list is that it has some special attributes. So we set those next. First, we’ll set the class, then the row names and finally the column names.
// first the class
= PROTECT(mkString("data.frame"));
SEXP class (final_result, R_ClassSymbol, class);
setAttrib
// next the row names
= PROTECT(allocVector(INTSXP, 2));
SEXP row_names (row_names)[0] = NA_INTEGER;
INTEGER(row_names)[1] = elements;
INTEGER(final_result, R_RowNamesSymbol, row_names);
setAttrib
// next the column names
= PROTECT(Rf_allocVector(STRSXP,7));
SEXP col_names (col_names,0, mkChar("group"));
SET_STRING_ELT(col_names,1, mkChar("n"));
SET_STRING_ELT(col_names,2, mkChar("mean"));
SET_STRING_ELT(col_names,3, mkChar("sd"));
SET_STRING_ELT(col_names,4, mkChar("se"));
SET_STRING_ELT(col_names,5, mkChar("d"));
SET_STRING_ELT(col_names,6, mkChar("t"));
SET_STRING_ELT(final_result, R_NamesSymbol, col_names); setAttrib
And finally, we just un-protect all our (11) declared variables and return the final result.
(11);
UNPROTECT
// return the data.frame to R
return final_result;
Some R Code
The final step is to write some R code so that we can actually use all this from R. But first, we need to build everything.
We’ll start off by building the Rust code with:
cargo build --release
This will produce a file called libwelfords.a
. We just need to copy this file to the same folder as our C code.
Next, we’ll build the C code. We’ll use R CMD SHLIB
to do this. But for it to work, we need to tell this command to link to our static library we just built from Rust. It would look like the following:
PKG_LIBS=libwelfords.a R CMD SHLIB welfords.c
We’ll now have a file called welfords.so
which is what we’ll actually be calling from R.
First, we need to load this file.
dyn.load("welfords.so")
We can now call this directly using .Call
, but to make it nicer I’ll wrap it in an R function.
<- function(input, set_size) {
welfords_rs .Call("welfords", input, set_size)
}
And that’s it. We can now use it as follows:
welfords_rs(input, set_size)
And we get the following output:
group n mean sd se d t
1 1 1 -0.06047565 NaN NaN NaN NaN
2 1 2 0.10467343 0.2335561 0.1651491 0.4481726 0.6338118
3 1 3 0.75601839 1.1401864 0.6582869 0.6630656 1.1484633
4 1 4 0.70964089 0.9355676 0.4677838 0.7585137 1.5170274
5 1 5 0.69357026 0.8110218 0.3627000 0.8551807 1.9122423
6 2 1 2.21506499 NaN NaN NaN NaN
7 2 2 1.58799060 0.8868171 0.6270744 1.7906630 2.5323799
8 2 3 0.80363999 1.4962754 0.8638750 0.5370936 0.9302735
9 2 4 0.55601678 1.3182674 0.6591337 0.4217784 0.8435569
10 2 5 0.45568103 1.1634896 0.5203284 0.3916503 0.8757567
Exactly the same as we got from the code we wrote in R.
For easy access to all the code just head over to my github.
Reuse
Citation
@online{jcolling2022,
author = {Lincoln J Colling},
editor = {},
title = {Calling {Rust} from {R} Using {C}},
date = {2022-08-22},
langid = {en},
abstract = {Why would you only want to write one language when you can
write all the languages. In this short post I’ll briefly explain
how, with the help of C, you can write Rust functions that you can
call from R.}
}