R in Grenoble #38

<br>

# Performance of <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> code

<br>

## <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> in Grenoble

<br>

### Florian Privé

<br>

---

# Part 1: Memory management in <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>

<br>

#### Read more with [this chapter of Advanced R](https://adv-r.hadley.nz/names-values.html)

---

### Understanding binding basics

```r
x <- c(1, 2, 3)
```

* It's creating an object, a vector of values, `c(1, 2, 3)`.
* And it's binding that object to a name, `x`.

<br>

```r
y <- x
```
<img src="https://d33wubrfki0l68.cloudfront.net/bdc72c04d3135f19fb3ab13731129eb84c9170af/f0ab9/diagrams/name-value/binding-2.png" width="30%" style="display: block; margin: auto;" />

---

### Copy-on-modify

```r
x <- c(1, 2, 3)
y <- x
y[3] <- 4
```

```r
x
```

```
[1] 1 2 3
```

---

### Copy-on-modify: what about inside functions?

```r
f <- function(a) {
  a
}
x <- c(1, 2, 3)
z <- f(x)
```

```r
f2 <- function(a) {
  a[1] <- 10
  a
}
z2 <- f2(x)
```

```
     x z2
[1,] 1 10
[2,] 2  2
[3,] 3  3
```

---

### Lists

It's not just names (i.e. variables) that point to values; elements of lists do too.

```r
l1 <- list(1, 2, 3)
```

```r
l2 <- l1
```

---

### Copy-on-modify for lists?

```r
l2[[3]] <- 4
```

---

### Data frames

<br>

**Data frames are lists of vectors.**

<br>

```r
d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
```
<img src="https://d33wubrfki0l68.cloudfront.net/80d8995999aa240ff4bc91bb6aba2c7bf72afc24/95ee6/diagrams/name-value/dataframe.png" width="28%" style="display: block; margin: auto;" />

---

```r
d2 <- d1
d2[, 2] <- d2[, 2] * 2  # modify one column
```

```r
d3 <- d1
d3[1, ] <- d3[1, ] * 3  # modify one row
```

---

# Part 2: Why loops are slow in R?

---

### Never grow a vector

<br>

Example computing the cumulative sums of a vector:

<br>

```r
x <- rnorm(10e3)

current_sum <- 0
res <- c()

for (x_i in x) {
  current_sum <- current_sum + x_i
  res <- c(res, current_sum)
}
```

<br>

Why is this code bad?

---

---

### Much faster code by pre-allocating

<br>

If you know the size of the result in advance, you should pre-allocate it.

<br>

```r
current_sum <- 0
res2 <- double(length(x))  # same as rep(0, length(x))

for (i in seq_along(x)) {
  current_sum <- current_sum + x[i]
  res2[i] <- current_sum
}
```

---

### Much faster code by using a list

<br>

It is okay to grow a list.

<br>

```r
current_sum <- 0
res3 <- list()

for (i in seq_along(x)) {
  current_sum <- current_sum + x[i]
  res3[[i]] <- current_sum
}

unlist(res3)
```

---

### Much faster code by growing a vector (the right way)

<br>

Since <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> v3.4, you can efficiently grow a vector (not with `c()` though).

<br>

```r
current_sum <- 0
res4 <- c()

for (i in seq_along(x)) {
  current_sum <- current_sum + x[i]
  res4[i] <- current_sum
}
```

---

### Much faster code by using existing efficient functions

<br>

Some base <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> functions are code in C or Fortan, and are very fast. Use them!

<br>

```r
res5 <- cumsum(x)
```

<br>

#### Other examples

- Using `rowMeans(x)` is much faster than `apply(x, 1, mean)`.

- If you want more efficient functions that apply to rows and columns of matrices, you can use [package {matrixStats}](https://github.com/HenrikBengtsson/matrixStats).

- When reading large text files, rather than `read.table()`, prefer using `data.table::fread()` (or `bigreadr::fread2()`).

- Generally, packages that uses C/Rcpp are efficient.

---

### Loops vs sapply() vs vectorization

<br>

Three ways to compute the sum of two vectors (element-wise):

<br>

```r
add_loop_prealloc <- function(x, y) {
  res <- double(length(x))
  for (i in seq_along(x)) {
    res[i] <- x[i] + y[i]
  }
  res
}

add_sapply <- function(x, y) {
  sapply(seq_along(x), function(i) x[i] + y[i])
}

add_vectorized <- `+`
```

---

### Benchmark

```r
N <- 10e3; x <- runif(N); y <- rnorm(N)

microbenchmark::microbenchmark(
        LOOP = add_loop_prealloc(x, y),
      SAPPLY = add_sapply(x, y),
  VECTORIZED = add_vectorized(x, y)
)
```

```
Unit: microseconds
       expr    min      lq     mean  median      uq     max neval
       LOOP  576.9  590.35  790.449  604.90  647.05 15708.3   100
     SAPPLY 6176.0 6680.45 7576.089 6911.90 7889.40 18345.6   100
 VECTORIZED    6.1    7.25   14.731   10.25   19.05    50.6   100
```

<br>

Loops are actually faster than `sapply()` because they can benefit from just-in-time compilation (JIT, since <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> v3.4).

Vectorization is by far the best.

---

### Why vectorizing?

<br>

I call *vectorized* a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).

As an interpreted language, for each iteration `res[i] <- x[i] + y[i]`, <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has to ask:

- what is the type of `x[i]` and `y[i]`?

- can I add these two types?

- what is the type of `x[i] + y[i]` then?

- can I store this result in `res` or do I need to convert it?

These questions must be answered for each iteration, which takes time.    
Some of this is alleviated by JIT compilation.

On the contrary, for vectorized functions, these questions must be answered only once, which saves a lot of time. Read more with [Noam Ross’s blog post on vectorization](http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html).

---

### Example of vectorization
Suppose we wish to estimate the integral `\(\int_0^1 x^2 dx\)` using a Monte-Carlo method. Essentially, we throw darts at the curve and count the number of darts that fall below the curve (as in the following figure).

---

### How to vectorize this code?

<br>

Naive R code implementing this Monte-Carlo algorithm:

<br>

```r
monte_carlo <- function(nb_samp) {
  
  hits <- 0
  for (i in seq_len(nb_samp)) {
    x <- runif(1)
    y <- runif(1)
    if (y < x^2) hits <- hits + 1
  }
  
  hits / nb_samp
}
```

```r
monte_carlo(1e4)
```

```
[1] 0.3362
```

---

### A better solution

```r
monte_carlo2 <- function(nb_samp) {
  
  all_x <- runif(nb_samp)
  all_y <- runif(nb_samp)
  
  hits <- 0
  for (i in seq_len(nb_samp)) {
    x <- all_x[i]
    y <- all_y[i]
    if (y < x^2) hits <- hits + 1
  }
  
  hits / nb_samp
}
```

---

### A better solution

```r
monte_carlo3 <- function(nb_samp) {
  
  all_x <- runif(nb_samp)
  all_y <- runif(nb_samp)
  test <- all_y < all_x^2
  
  hits <- 0
  for (i in seq_len(nb_samp)) {
    if (test[i]) hits <- hits + 1
  }
  
  hits / nb_samp
}
```

---

### An even better solution

```r
monte_carlo4 <- function(nb_samp) {
  
  all_x <- runif(nb_samp)
  all_y <- runif(nb_samp)
  test <- all_y < all_x^2
  
  mean(test)
}
```

```r
monte_carlo5 <- function(nb_samp) {
  mean(runif(nb_samp) < runif(nb_samp)^2)
}
```

```r
c(monte_carlo (1e6),
  monte_carlo2(1e6),
  monte_carlo3(1e6),
  monte_carlo4(1e6),
  monte_carlo5(1e6))
```

```
[1] 0.333325 0.333494 0.333562 0.333586 0.333303
```

---

### Benchmark

```r
microbenchmark::microbenchmark(
  monte_carlo (1e4),
  monte_carlo2(1e4),
  monte_carlo3(1e4),
  monte_carlo4(1e4),
  monte_carlo5(1e4)
)
```

```
Unit: microseconds
                expr     min       lq      mean   median       uq
  monte_carlo(10000) 27217.6 35832.90 41264.791 38630.05 44239.40
 monte_carlo2(10000)  1201.4  1247.15  1551.642  1335.40  1927.65
 monte_carlo3(10000)   847.1   895.60  1158.774   967.75  1234.60
 monte_carlo4(10000)   588.9   622.60   783.828   659.80   788.75
 monte_carlo5(10000)   570.1   589.15   782.959   641.65   930.20
     max neval
 87908.6   100
  2513.9   100
  8156.3   100
  5411.4   100
  4538.2   100
```

---

# Part 3: Other strategies
# to make your code faster

---

### Identify *where* your code is slow

<br>

> "Programmers waste enormous amounts of time thinking about, or worrying
> about, the speed of noncritical parts of their programs, and these attempts 
> at efficiency actually have a strong negative impact when debugging and
> maintenance are considered."
>
> -- Donald Knuth.

<br>

Trying to optimize each and every part of your code    
`\(\Longrightarrow\)` time lost + code too complex

R is great at prototyping quickly; always start with that!        
If performance matters, then profile your code to see which part of your code is taking too much time and optimize only this part!

Learn more on how to profile your code in RStudio in [this article](https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-R-code-with-the-RStudio-IDE).

---

### Let us profile the code we used before

<br>

```r
x <- rnorm(50e3)

current_sum <- 0
res <- c()

for (x_i in x) {
  current_sum <- current_sum + x_i
  res <- c(res, current_sum)
}
```

<br>

In RStudio, 'Profile' panel `\(\rightarrow\)` Profile Selected Line(s).

---

### Rcpp: making R functions from C++ code

Rcpp lives between R and C++, so that you can get

- the performance of C++,

- the convenience of R.

Typical bottlenecks that C++ can address include:

- Recursive functions, or problems which involve calling functions **millions of times**. 
The overhead of calling a function in C++ is much lower than that in R.

- Loops that **can’t be easily vectorized** because subsequent iterations depend on previous ones.

- Problems that require **advanced data structures** and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. See [this chapter](https://adv-r.hadley.nz/rcpp.html#stl).

<br>

To learn more, have a look at [my presentation on Rcpp](https://privefl.github.io/R-presentation/Rcpp.html).

---

### Parallelization

<br>

For parallelizing R code, I basically always use `foreach` and recommend to do so. See [my guide to parallelism in R with `foreach`](https://privefl.github.io/blog/a-guide-to-parallelism-in-r/).

<br>

Just remember to **optimize your code before** trying to parallelize it.

---

# Thanks!

<br>

Presentation available at [bit.ly/RUGgre38](https://bit.ly/RUGgre38)

<br>