Rcpp

# Rcpp
## <a href="https://privefl.github.io/R-presentation/Rcpp.html" class="uri">https://privefl.github.io/R-presentation/Rcpp.html</a>
### Florian Privé
### March 15, 2018

---

# Introduction

---

## Resources

### Rcpp

- [Advanced R](https://adv-r.hadley.nz/rcpp.html)

- [Rcpp for everyone](https://teuder.github.io/rcpp4everyone_en/)

- [Rcpp Gallery](http://gallery.rcpp.org/)

- [Rcpp FAQ](https://cloud.r-project.org/web/packages/Rcpp/vignettes/Rcpp-FAQ.pdf)

<br>

### RcppArmadillo (linear algebra)

- [RcppArmadillo cheatsheet](https://github.com/petewerner/misc/wiki/RcppArmadillo-cheatsheet)

- [Armadillo documentation](http://arma.sourceforge.net/docs.html)

- [Fix OS X Mavericks "-lgfortran" and "-lquadmath" installation errors](https://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks--lgfortran-and--lquadmath-error/)

---

## How I see Rcpp?

<br>

Rcpp lives between R and C++, so that you can get

- the *performance of C++* and

- the *convenience of R*.

<br>

- I love *performance* and

- I also enjoy *simplicity*,

Rcpp might be my favorite R package.

---

## Using Rcpp with RStudio

Learn more at https://support.rstudio.com/hc/en-us/articles/200486088-Using-Rcpp-with-RStudio.

---

## First main example: testing if number is odd

In R:

```r
is_odd_r <- function(n = 10) {
  n %% 2 == 1
}
```

In Rcpp:

```cpp
// [[Rcpp::export]]
bool is_odd_cpp(int n = 10) {
  bool v = (n % 2 == 1);
  return v;
}
```

```r
c(is_odd_r(),     is_odd_cpp(), 
  is_odd_r(21),   is_odd_cpp(21),
  is_odd_r(42),   is_odd_cpp(42))
```

```
## [1] FALSE FALSE  TRUE  TRUE FALSE FALSE
```

---

## C++ function explained

### C++ is statically typed

<div class="figure" style="text-align: center">
<img src="rcpp-graphical.png" alt="Graphical annotation of the &lt;em&gt;is_odd_cpp&lt;/em&gt; function.&lt;br/&gt;&lt;small&gt;https://doi.org/10.7287/peerj.preprints.3188v1&lt;/small&gt;" width="80%" />
<p class="caption">Graphical annotation of the <em>is_odd_cpp</em> function.<br/><small>https://doi.org/10.7287/peerj.preprints.3188v1</small></p>
</div>

---

## Whole Rcpp file

---

## Second main example: sum of a vector

```r
sumR <- function(x) {
  total <- 0
  for (i in seq_along(x)) {
    total <- total + x[i]
  }
  total
}
```

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double sumC1(const NumericVector& x) {
  
  int n = x.size();
  double total = 0;
  for (int i = 0; i < n; i++) {
    total += x[i];
  }
  
  return total;
}
```

---

### The C++ version is similar, but

* To find the length of the vector, we use the <mark>`.size()`</mark> method (or `.length()`),   which returns an integer. C++ methods are called with `.` (i.e., a full stop).
  
* The `for` statement has a different syntax: <mark>`for(init; check; increment)`</mark>. 
    + This loop is initialised by creating a new variable of type integer called `i` with value 0.
    + Before each iteration we check that `i < n`, and terminate the loop if it's not. 
    + After each iteration, we increment the value of `i` by one, using the special prefix operator `++` which increases the value of `i` by 1.

* In C++, vector indices start at 0. I'll say this again because it's so important: <mark>__IN C++, VECTOR INDICES START AT 0!__</mark> This is a very common source of bugs when converting R functions to C++.

* Use `=` for assignment, not `<-`. C++ provides operators that modify in-place: <mark>`total += x[i]`</mark> is equivalent to `total = total + x[i]`. Similar in-place operators are `-=`, `*=`, and `/=`.

---

## Rcpp Sugar

With [Rcpp Sugar](http://dirk.eddelbuettel.com/code/rcpp/Rcpp-sugar.pdf), you have accessed to some R-like vectorized functions:

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double sumCS(const NumericVector& x) {
  return sum(x);
}
```

There are many [R-like functions available in Rcpp](https://teuder.github.io/rcpp4everyone_en/210_rcpp_functions.html).

---

## Yet other possibilities

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double sumC2(const NumericVector& x) {
  
  double total = 0;
  NumericVector::const_iterator it;
  for (it = x.begin(); it != x.end(); it++) {
    total += *it;
  }
  return total;
}

// [[Rcpp::export]]
double sumC3(const NumericVector& x) {
  return std::accumulate(x.begin(), x.end(), 0.0);
}
```

---

## Microbenchmark

```r
x <- runif(1e5)

microbenchmark::microbenchmark(
  "R LOOP" = sumR(x),
  "R VEC"  = sum(x),
  "C LOOP" = sumC1(x),
  "C VEC"  = sumCS(x),
  "C IT"   = sumC2(x),
  "C STD"  = sumC3(x)
)
```

```
## Unit: microseconds
##    expr      min        lq      mean    median        uq      max
##  R LOOP 4199.858 4357.6880 4772.2233 4403.5705 4677.0215 8888.635
##   R VEC  118.221  126.4065  134.2891  128.0980  134.7790  211.995
##  C LOOP  157.269  169.9370  188.7199  171.8795  179.0410 1239.887
##   C VEC  159.947  170.3065  190.7651  174.4585  185.8320 1116.796
##    C IT  351.685  379.3410  413.5425  384.4135  403.1525 1482.656
##   C STD  157.546  170.1120  191.1931  173.2455  182.4770 1360.836
##  neval
##    100
##    100
##    100
##    100
##    100
##    100
```

---

## Rcpp main types

<br>

<table class='gmisc_table' style='border-collapse: collapse; margin-top: 1em; margin-bottom: 1em; margin-left: auto; margin-right: auto' >
<thead>
<tr>
<th style='border-bottom: 1px solid grey; border-top: 2px solid grey; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>R</th>
<th style='border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Rcpp (scalar)</th>
<th style='border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Rcpp (vector)</th>
<th style='border-bottom: 1px solid grey; border-top: 2px solid grey; border-right: 1px solid black; text-align: center;'>Rcpp (matrix)</th>
</tr>
</thead>
<tbody>
<tr>
<td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>logical</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>bool</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>LogicalVector</td>
<td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>LogicalMatrix</td>
</tr>
<tr>
<td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>integer</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>int</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>IntegerVector</td>
<td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>IntegerMatrix</td>
</tr>
<tr>
<td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>double</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>double</td>
<td style='padding: 10px 20px 10px 20px; text-align: center;'>NumericVector</td>
<td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>NumericMatrix</td>
</tr>
<tr>
<td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>character</td>
<td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; text-align: center;'>String</td>
<td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; text-align: center;'>CharacterVector</td>
<td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; border-right: 1px solid black; text-align: center;'>CharacterMatrix</td>
</tr>
</tbody>
</table>

<br>

There are also `List` and `DataFrame` (but prefer using `List`).

---

# Quiz

---

## Which R functions does this implement? (1/5)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double f1(NumericVector x) {
  
  int n = x.size();
  double y = 0;

for (int i = 0; i < n; i++) {
    y += x[i];
  }
  
  return y / n;
}
```

---

## Which R functions does this implement? (2/5)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector f2(NumericVector x) {
  
  int n = x.size();
  NumericVector out(n);

out[0] = x[0];
  for (int i = 1; i < n; i++) {
    out[i] = out[i - 1] + x[i];
  }
  
  return out;
}
```

---

## Which R functions does this implement? (3/5)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
bool f3(LogicalVector x) {
  
  int n = x.size();

for (int i = 0; i < n; i++) {
    if (x[i]) return true;
  }
  
  return false;
}
```

---

## Which R functions does this implement? (4/5)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector f4(List x) {
  
  int n = x.size();
  NumericVector res(n);

for (int i = 0; i < n; i++) {
    res[i] = mean( as<NumericVector>(x[i]) );
  }
  
  return res;
}
```

---

## Which R functions does this implement? (5/5)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector f5(NumericVector x, NumericVector y) {
  
  // Recycling
  int n = std::max(x.size(), y.size());
  NumericVector x_rep = rep_len(x, n);
  NumericVector y_rep = rep_len(y, n);

NumericVector out(n);

for (int i = 0; i < n; i++) {
    out[i] = std::min(x_rep[i], y_rep[i]);
  }

return out;
}
```

---

## Which R functions does this implement? (answers)

```r
x <- runif(100)
stopifnot(isTRUE(all.equal(f1(x), mean(x))))
```

```r
stopifnot(isTRUE(all.equal(f2(x), cumsum(x))))
```

```r
x2 <- rep(FALSE, 10)
stopifnot(isTRUE(all.equal(f3(x2), any(x2))))
x2[1] <- TRUE
stopifnot(isTRUE(all.equal(f3(x2), any(x2))))
x2[] <- TRUE
stopifnot(isTRUE(all.equal(f3(x2), any(x2))))
```

```r
x3 <- lapply(1:10, runif)
stopifnot(isTRUE(all.equal(f4(x3), sapply(x3, mean))))
```

```r
x4 <- runif(11)
stopifnot(isTRUE(all.equal(f5(x, x4), pmin(x, x4))))
```

```
## Warning in pmin(x, x4): an argument will be fractionally recycled
```

---

# More

---

## Use of C++ code when needed

.footnote[More infos [there](http://adv-r.had.co.nz/Rcpp.html) and you can learn more on the performance of R code in [my course](https://privefl.github.io/advr38book/performance.html).]

<br>

Typical bottlenecks that C++ can address include:

- Recursive functions, or problems which involve calling functions **millions of times**. 
The overhead of calling a function in C++ is much lower than that in R.

- Loops that **can’t be easily vectorized** because subsequent iterations depend on previous ones.

- Problems that require **advanced data structures** and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. See [this chapter](https://adv-r.hadley.nz/rcpp.html#stl).

---

## Third main example: Gibbs sampler

```r
gibbs_r <- function(N, thin) {
  
  mat <- matrix(nrow = 2, ncol = N)
  x <- y <- 0

for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3, y * y + 4)
      y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
    }
    mat[, i] <- c(x, y)
  }
  
  mat
}
```

<br>

This function can't be vectorized because iterations depend on previous ones.

---

### Recode it in Rcpp

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix gibbs_cpp(int N, int thin) {
  
  NumericMatrix mat(2, N);
  double x = 0, y = 0;

for(int i = 0; i < N; i++) {
    for(int j = 0; j < thin; j++) {
      x = rgamma(1, 3, 1 / (y * y + 4))[0]; // 3rd param -> inverse
      y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];
    }
    mat(0, i) = x;
    mat(1, i) = y;
  }

return(mat);
}
```

There is not much difference with the previous R version!

---

### Microbenchmark

```r
microbenchmark::microbenchmark(
  gibbs_r(100, 10),
  gibbs_cpp(100, 10),
  gibbs_r(1000, 10),
  gibbs_cpp(1000, 10)
)
```

```
## Unit: microseconds
##                 expr       min        lq       mean    median
##     gibbs_r(100, 10)  5340.502  5781.488  6216.0982  5910.880
##   gibbs_cpp(100, 10)   312.706   347.861   383.1457   360.887
##    gibbs_r(1000, 10) 57991.903 63438.467 65064.6874 63902.342
##  gibbs_cpp(1000, 10)  3263.238  3480.166  3642.9946  3558.119
##         uq        max neval
##   6036.889  12308.717   100
##    380.751   1524.377   100
##  64666.805 111151.260   100
##   3655.977   7026.987   100
```

---

## Other examples of Rcpp code

---

## Missing values

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List scalar_missings() {
  
  int    int_s = NA_INTEGER;
  String chr_s = NA_STRING;
  bool   lgl_s = NA_LOGICAL;   // Warning!
  double num_s = NA_REAL;

return List::create(int_s, chr_s, lgl_s, num_s);
}
```

```r
str(scalar_missings())
```

```
## List of 4
##  $ : int NA
##  $ : chr NA
##  $ : logi TRUE
##  $ : num NA
```

---

## Two traps in one

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void copy_vec(NumericVector x) {
  NumericVector y = x;
  y[0] = 100;
}
```

```r
x2 <- runif(10)
copy_vec(x2)
x2
```

```
##  [1] 100.00000000   0.88289940   0.53702082   0.31790320   0.25654386
##  [6]   0.01424356   0.61715938   0.38176546   0.69886953   0.56337326
```

```r
x1 <- 1:10
copy_vec(x1)
x1
```

```
##  [1]  1  2  3  4  5  6  7  8  9 10
```

---

## Two traps in one, explanation

- R objects are always passed by reference in Rcpp (even without the `&`). Use `clone()` to get a copy.

- But if e.g. passing a vector of type integer (`1:10`) as a NumericVector (type double), the vector will be copied (and casted to type double).

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void copy_vec2(NumericVector x) {
  NumericVector y = clone(x);
  y[0] = 100;
}
```

```r
x2 <- runif(10)
copy_vec2(x2)
x2
```

```
##  [1] 0.16259102 0.07173874 0.48382528 0.85438239 0.42743254 0.61468127
##  [7] 0.20034410 0.86976531 0.09244387 0.05780916
```

---

## Yet another trap (of C++)

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double int_div() {
  int x = 2, y = 3;
  double z = x / y;
  return z;
}

// [[Rcpp::export]]
double int_div2() {
  int x = 2, y = 3;
  return (double)x / y;
}
```

```r
c(int_div(), int_div2())
```

```
## [1] 0.0000000 0.6666667
```

---

# Rcpp in an R package

---

## Create a package (in RStudio)

```r
# In a new RStudio project, run
usethis::use_description(); usethis::use_namespace()
dir.create("R"); usethis::use_package_doc()
usethis::use_roxygen_md()
```

**Restart RStudio** and change the following options:

---

## Add Rcpp to your package

<br>

```r
usethis::use_rcpp()
```

```r
✔ Adding 'Rcpp' to LinkingTo field in DESCRIPTION
✔ Adding 'Rcpp' to Imports field in DESCRIPTION
✔ Creating 'src/'
✔ Adding '*.o', '*.so', '*.dll' to 'src/.gitignore'
● Include the following roxygen tags somewhere in your package
  #' @useDynLib testpkg, .registration = TRUE
  #' @importFrom Rcpp sourceCpp
  NULL
● Run document()  ## forget that
```

<br>

Then, create a new Rcpp file and save it in `src/`.

---

## Build your package and learn more

Use `Ctrl/Cmd + Shift + B` to document and build your package.

**Learn more** with

---

## Last example: your turn

<br>

```r
fun_r <- function(x) {
  
  n <- length(x)
  y <- numeric(n); y[1] <- 1
  
  for (i in 2:n) {
    y[i] <- y[i - 1]^2 + x[i]
  }
  
  y
}
```

1. Will this R function always work?

1. Can you vectorize this function to avoid the loop and make it faster?

1. If not, recode it with Rcpp.

1. Benchmark the two versions after having checked the results.

---

## Microbenchmark of one possible solution

```r
x <- rnorm(1e6)
all.equal(fun_r(x), fun_cpp(x))
```

```
## [1] TRUE
```

```r
microbenchmark::microbenchmark(
  fun_r(x), 
  fun_cpp(x)
)
```

```
## Unit: milliseconds
##        expr       min        lq      mean    median       uq
##    fun_r(x) 115.82288 117.93668 120.38656 118.73573 122.1426
##  fun_cpp(x)  10.06505  10.82437  12.16268  10.90465  14.3630
##        max neval
##  137.22758   100
##   17.90154   100
```

---

# Thanks!

<br>

Presentation available at

https://privefl.github.io/R-presentation/Rcpp.html

<br>

<i class="fab  fa-twitter "></i> [privefl](https://twitter.com/privefl) &nbsp;&nbsp;&nbsp;&nbsp; <i class="fab  fa-github "></i> [privefl](https://github.com/privefl) &nbsp;&nbsp;&nbsp;&nbsp; <i class="fab  fa-stack-overflow "></i> [F. Privé](https://stackoverflow.com/users/6103040/f-priv%c3%a9)