class: center, middle, inverse, title-slide # Rcpp ##
https://privefl.github.io/R-presentation/Rcpp.html
### Florian Privé ### March 15, 2018 --- class: center, middle, inverse # Introduction --- ## Resources ### Rcpp - [Advanced R](https://adv-r.hadley.nz/rcpp.html) - [Rcpp for everyone](https://teuder.github.io/rcpp4everyone_en/) - [Rcpp Gallery](http://gallery.rcpp.org/) - [Rcpp FAQ](https://cloud.r-project.org/web/packages/Rcpp/vignettes/Rcpp-FAQ.pdf) <br> ### RcppArmadillo (linear algebra) - [RcppArmadillo cheatsheet](https://github.com/petewerner/misc/wiki/RcppArmadillo-cheatsheet) - [Armadillo documentation](http://arma.sourceforge.net/docs.html) - [Fix OS X Mavericks "-lgfortran" and "-lquadmath" installation errors](https://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks--lgfortran-and--lquadmath-error/) --- ## How I see Rcpp? <br> Rcpp lives between R and C++, so that you can get - the *performance of C++* and - the *convenience of R*. <br> As - I love *performance* and - I also enjoy *simplicity*, Rcpp might be my favorite R package. .footnote[I often speak about Rcpp as if it were a programming language.] --- ## Using Rcpp with RStudio <img src="http://li-kan.com/image/post/Rcpp/create_file.png" width="80%" style="display: block; margin: auto;" /> Learn more at https://support.rstudio.com/hc/en-us/articles/200486088-Using-Rcpp-with-RStudio. --- ## First main example: testing if number is odd In R: ```r is_odd_r <- function(n = 10) { n %% 2 == 1 } ``` In Rcpp: ```cpp // [[Rcpp::export]] bool is_odd_cpp(int n = 10) { bool v = (n % 2 == 1); return v; } ``` ```r c(is_odd_r(), is_odd_cpp(), is_odd_r(21), is_odd_cpp(21), is_odd_r(42), is_odd_cpp(42)) ``` ``` ## [1] FALSE FALSE TRUE TRUE FALSE FALSE ``` --- ## C++ function explained ### C++ is statically typed <div class="figure" style="text-align: center"> <img src="rcpp-graphical.png" alt="Graphical annotation of the <em>is_odd_cpp</em> function.<br/><small>https://doi.org/10.7287/peerj.preprints.3188v1</small>" width="80%" /> <p class="caption">Graphical annotation of the <em>is_odd_cpp</em> function.<br/><small>https://doi.org/10.7287/peerj.preprints.3188v1</small></p> </div> .footnote[You don't really need `v` here.] --- ## Whole Rcpp file <img src="rcpp-file.png" width="80%" style="display: block; margin: auto;" /> .footnote[<mark>Source</mark>: compile the Rcpp function and execute the block of R code.] --- ## Second main example: sum of a vector ```r sumR <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total } ``` ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double sumC1(const NumericVector& x) { int n = x.size(); double total = 0; for (int i = 0; i < n; i++) { total += x[i]; } return total; } ``` --- ### The C++ version is similar, but * To find the length of the vector, we use the <mark>`.size()`</mark> method (or `.length()`), which returns an integer. C++ methods are called with `.` (i.e., a full stop). * The `for` statement has a different syntax: <mark>`for(init; check; increment)`</mark>. + This loop is initialised by creating a new variable of type integer called `i` with value 0. + Before each iteration we check that `i < n`, and terminate the loop if it's not. + After each iteration, we increment the value of `i` by one, using the special prefix operator `++` which increases the value of `i` by 1. * In C++, vector indices start at 0. I'll say this again because it's so important: <mark>__IN C++, VECTOR INDICES START AT 0!__</mark> This is a very common source of bugs when converting R functions to C++. * Use `=` for assignment, not `<-`. C++ provides operators that modify in-place: <mark>`total += x[i]`</mark> is equivalent to `total = total + x[i]`. Similar in-place operators are `-=`, `*=`, and `/=`. --- ## Rcpp Sugar With [Rcpp Sugar](http://dirk.eddelbuettel.com/code/rcpp/Rcpp-sugar.pdf), you have accessed to some R-like vectorized functions: ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double sumCS(const NumericVector& x) { return sum(x); } ``` There are many [R-like functions available in Rcpp](https://teuder.github.io/rcpp4everyone_en/210_rcpp_functions.html). --- ## Yet other possibilities ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double sumC2(const NumericVector& x) { double total = 0; NumericVector::const_iterator it; for (it = x.begin(); it != x.end(); it++) { total += *it; } return total; } // [[Rcpp::export]] double sumC3(const NumericVector& x) { return std::accumulate(x.begin(), x.end(), 0.0); } ``` --- ## Microbenchmark ```r x <- runif(1e5) microbenchmark::microbenchmark( "R LOOP" = sumR(x), "R VEC" = sum(x), "C LOOP" = sumC1(x), "C VEC" = sumCS(x), "C IT" = sumC2(x), "C STD" = sumC3(x) ) ``` ``` ## Unit: microseconds ## expr min lq mean median uq max ## R LOOP 4199.858 4357.6880 4772.2233 4403.5705 4677.0215 8888.635 ## R VEC 118.221 126.4065 134.2891 128.0980 134.7790 211.995 ## C LOOP 157.269 169.9370 188.7199 171.8795 179.0410 1239.887 ## C VEC 159.947 170.3065 190.7651 174.4585 185.8320 1116.796 ## C IT 351.685 379.3410 413.5425 384.4135 403.1525 1482.656 ## C STD 157.546 170.1120 191.1931 173.2455 182.4770 1360.836 ## neval ## 100 ## 100 ## 100 ## 100 ## 100 ## 100 ``` --- ## Rcpp main types <br> <table class='gmisc_table' style='border-collapse: collapse; margin-top: 1em; margin-bottom: 1em; margin-left: auto; margin-right: auto' > <thead> <tr> <th style='border-bottom: 1px solid grey; border-top: 2px solid grey; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>R</th> <th style='border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Rcpp (scalar)</th> <th style='border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Rcpp (vector)</th> <th style='border-bottom: 1px solid grey; border-top: 2px solid grey; border-right: 1px solid black; text-align: center;'>Rcpp (matrix)</th> </tr> </thead> <tbody> <tr> <td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>logical</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>bool</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>LogicalVector</td> <td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>LogicalMatrix</td> </tr> <tr> <td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>integer</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>int</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>IntegerVector</td> <td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>IntegerMatrix</td> </tr> <tr> <td style='padding: 10px 20px 10px 20px; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>double</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>double</td> <td style='padding: 10px 20px 10px 20px; text-align: center;'>NumericVector</td> <td style='padding: 10px 20px 10px 20px; border-right: 1px solid black; text-align: center;'>NumericMatrix</td> </tr> <tr> <td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; border-left: 1px solid black; border-right: 1px solid black; text-align: center;'>character</td> <td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; text-align: center;'>String</td> <td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; text-align: center;'>CharacterVector</td> <td style='padding: 10px 20px 10px 20px; border-bottom: 2px solid grey; border-right: 1px solid black; text-align: center;'>CharacterMatrix</td> </tr> </tbody> </table> <br> There are also `List` and `DataFrame` (but prefer using `List`). --- class: center, middle, inverse # Quiz --- ## Which R functions does this implement? (1/5) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double f1(NumericVector x) { int n = x.size(); double y = 0; for (int i = 0; i < n; i++) { y += x[i]; } return y / n; } ``` --- ## Which R functions does this implement? (2/5) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector f2(NumericVector x) { int n = x.size(); NumericVector out(n); out[0] = x[0]; for (int i = 1; i < n; i++) { out[i] = out[i - 1] + x[i]; } return out; } ``` --- ## Which R functions does this implement? (3/5) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] bool f3(LogicalVector x) { int n = x.size(); for (int i = 0; i < n; i++) { if (x[i]) return true; } return false; } ``` --- ## Which R functions does this implement? (4/5) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector f4(List x) { int n = x.size(); NumericVector res(n); for (int i = 0; i < n; i++) { res[i] = mean( as<NumericVector>(x[i]) ); } return res; } ``` --- ## Which R functions does this implement? (5/5) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector f5(NumericVector x, NumericVector y) { // Recycling int n = std::max(x.size(), y.size()); NumericVector x_rep = rep_len(x, n); NumericVector y_rep = rep_len(y, n); NumericVector out(n); for (int i = 0; i < n; i++) { out[i] = std::min(x_rep[i], y_rep[i]); } return out; } ``` --- ## Which R functions does this implement? (answers) ```r x <- runif(100) stopifnot(isTRUE(all.equal(f1(x), mean(x)))) ``` ```r stopifnot(isTRUE(all.equal(f2(x), cumsum(x)))) ``` ```r x2 <- rep(FALSE, 10) stopifnot(isTRUE(all.equal(f3(x2), any(x2)))) x2[1] <- TRUE stopifnot(isTRUE(all.equal(f3(x2), any(x2)))) x2[] <- TRUE stopifnot(isTRUE(all.equal(f3(x2), any(x2)))) ``` ```r x3 <- lapply(1:10, runif) stopifnot(isTRUE(all.equal(f4(x3), sapply(x3, mean)))) ``` ```r x4 <- runif(11) stopifnot(isTRUE(all.equal(f5(x, x4), pmin(x, x4)))) ``` ``` ## Warning in pmin(x, x4): an argument will be fractionally recycled ``` --- class: center, middle, inverse # More --- ## Use of C++ code when needed .footnote[More infos [there](http://adv-r.had.co.nz/Rcpp.html) and you can learn more on the performance of R code in [my course](https://privefl.github.io/advr38book/performance.html).] <br> Typical bottlenecks that C++ can address include: - Recursive functions, or problems which involve calling functions **millions of times**. The overhead of calling a function in C++ is much lower than that in R. - Loops that **can’t be easily vectorized** because subsequent iterations depend on previous ones. - Problems that require **advanced data structures** and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. See [this chapter](https://adv-r.hadley.nz/rcpp.html#stl). --- ## Third main example: Gibbs sampler ```r gibbs_r <- function(N, thin) { mat <- matrix(nrow = 2, ncol = N) x <- y <- 0 for (i in 1:N) { for (j in 1:thin) { x <- rgamma(1, 3, y * y + 4) y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1))) } mat[, i] <- c(x, y) } mat } ``` <br> This function can't be vectorized because iterations depend on previous ones. --- ### Recode it in Rcpp ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericMatrix gibbs_cpp(int N, int thin) { NumericMatrix mat(2, N); double x = 0, y = 0; for(int i = 0; i < N; i++) { for(int j = 0; j < thin; j++) { x = rgamma(1, 3, 1 / (y * y + 4))[0]; // 3rd param -> inverse y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0]; } mat(0, i) = x; mat(1, i) = y; } return(mat); } ``` There is not much difference with the previous R version! --- ### Microbenchmark ```r microbenchmark::microbenchmark( gibbs_r(100, 10), gibbs_cpp(100, 10), gibbs_r(1000, 10), gibbs_cpp(1000, 10) ) ``` ``` ## Unit: microseconds ## expr min lq mean median ## gibbs_r(100, 10) 5340.502 5781.488 6216.0982 5910.880 ## gibbs_cpp(100, 10) 312.706 347.861 383.1457 360.887 ## gibbs_r(1000, 10) 57991.903 63438.467 65064.6874 63902.342 ## gibbs_cpp(1000, 10) 3263.238 3480.166 3642.9946 3558.119 ## uq max neval ## 6036.889 12308.717 100 ## 380.751 1524.377 100 ## 64666.805 111151.260 100 ## 3655.977 7026.987 100 ``` --- ## Other examples of Rcpp code <a href="https://github.com/hadley/adv-r/blob/master/extras/cpp" target="_blank"> <img src="Rcpp_files/figure-html/unnamed-chunk-26-1.png" width="80%" style="display: block; margin: auto;" /> </a> --- ## Missing values ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] List scalar_missings() { int int_s = NA_INTEGER; String chr_s = NA_STRING; bool lgl_s = NA_LOGICAL; // Warning! double num_s = NA_REAL; return List::create(int_s, chr_s, lgl_s, num_s); } ``` ```r str(scalar_missings()) ``` ``` ## List of 4 ## $ : int NA ## $ : chr NA ## $ : logi TRUE ## $ : num NA ``` --- ## Two traps in one ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] void copy_vec(NumericVector x) { NumericVector y = x; y[0] = 100; } ``` ```r x2 <- runif(10) copy_vec(x2) x2 ``` ``` ## [1] 100.00000000 0.88289940 0.53702082 0.31790320 0.25654386 ## [6] 0.01424356 0.61715938 0.38176546 0.69886953 0.56337326 ``` ```r x1 <- 1:10 copy_vec(x1) x1 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` --- ## Two traps in one, explanation - R objects are always passed by reference in Rcpp (even without the `&`). Use `clone()` to get a copy. - But if e.g. passing a vector of type integer (`1:10`) as a NumericVector (type double), the vector will be copied (and casted to type double). ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] void copy_vec2(NumericVector x) { NumericVector y = clone(x); y[0] = 100; } ``` ```r x2 <- runif(10) copy_vec2(x2) x2 ``` ``` ## [1] 0.16259102 0.07173874 0.48382528 0.85438239 0.42743254 0.61468127 ## [7] 0.20034410 0.86976531 0.09244387 0.05780916 ``` --- ## Yet another trap (of C++) ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double int_div() { int x = 2, y = 3; double z = x / y; return z; } // [[Rcpp::export]] double int_div2() { int x = 2, y = 3; return (double)x / y; } ``` ```r c(int_div(), int_div2()) ``` ``` ## [1] 0.0000000 0.6666667 ``` --- class: center, middle, inverse # Rcpp in an R package --- ## Create a package (in RStudio) ```r # In a new RStudio project, run usethis::use_description(); usethis::use_namespace() dir.create("R"); usethis::use_package_doc() usethis::use_roxygen_md() ``` **Restart RStudio** and change the following options: <img src="build-doc.png" width="50%" style="display: block; margin: auto;" /> --- ## Add Rcpp to your package <br> ```r usethis::use_rcpp() ``` ```r ✔ Adding 'Rcpp' to LinkingTo field in DESCRIPTION ✔ Adding 'Rcpp' to Imports field in DESCRIPTION ✔ Creating 'src/' ✔ Adding '*.o', '*.so', '*.dll' to 'src/.gitignore' ● Include the following roxygen tags somewhere in your package #' @useDynLib testpkg, .registration = TRUE #' @importFrom Rcpp sourceCpp NULL ● Run document() ## forget that ``` <br> Then, create a new Rcpp file and save it in `src/`. .footnote[Put the two roxygen tags in the `R/testpkg-package.R`.] --- ## Build your package and learn more Use `Ctrl/Cmd + Shift + B` to document and build your package. **Learn more** with <a href="http://r-pkgs.had.co.nz/" target="_blank"> <img src="Rcpp_files/figure-html/unnamed-chunk-39-1.png" width="80%" style="display: block; margin: auto;" /> </a> --- ## Last example: your turn <br> ```r fun_r <- function(x) { n <- length(x) y <- numeric(n); y[1] <- 1 for (i in 2:n) { y[i] <- y[i - 1]^2 + x[i] } y } ``` 1. Will this R function always work? 1. Can you vectorize this function to avoid the loop and make it faster? 1. If not, recode it with Rcpp. 1. Benchmark the two versions after having checked the results. --- ## Microbenchmark of one possible solution ```r x <- rnorm(1e6) all.equal(fun_r(x), fun_cpp(x)) ``` ``` ## [1] TRUE ``` ```r microbenchmark::microbenchmark( fun_r(x), fun_cpp(x) ) ``` ``` ## Unit: milliseconds ## expr min lq mean median uq ## fun_r(x) 115.82288 117.93668 120.38656 118.73573 122.1426 ## fun_cpp(x) 10.06505 10.82437 12.16268 10.90465 14.3630 ## max neval ## 137.22758 100 ## 17.90154 100 ``` --- class: center, middle, inverse # Thanks! <br> Presentation available at https://privefl.github.io/R-presentation/Rcpp.html <br>
[privefl](https://twitter.com/privefl)
[privefl](https://github.com/privefl)
[F. Privé](https://stackoverflow.com/users/6103040/f-priv%c3%a9) .footnote[Slides created via R package [**xaringan**](https://github.com/yihui/xaringan).]