Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide


Performance of code


in Grenoble


Florian Privé


1 / 31

Part 1: Memory management in


Read more with this chapter of Advanced R

2 / 31

Understanding binding basics

x <- c(1, 2, 3)
  • It's creating an object, a vector of values, c(1, 2, 3).
  • And it's binding that object to a name, x.

3 / 31

Understanding binding basics

x <- c(1, 2, 3)
  • It's creating an object, a vector of values, c(1, 2, 3).
  • And it's binding that object to a name, x.


y <- x

3 / 31

Copy-on-modify

x <- c(1, 2, 3)
y <- x
y[3] <- 4
4 / 31

Copy-on-modify

x <- c(1, 2, 3)
y <- x
y[3] <- 4
x
[1] 1 2 3
4 / 31

Copy-on-modify

x <- c(1, 2, 3)
y <- x
y[3] <- 4
x
[1] 1 2 3

4 / 31

Copy-on-modify: what about inside functions?

f <- function(a) {
a
}
x <- c(1, 2, 3)
z <- f(x)
5 / 31

Copy-on-modify: what about inside functions?

f <- function(a) {
a
}
x <- c(1, 2, 3)
z <- f(x)

5 / 31

Copy-on-modify: what about inside functions?

f <- function(a) {
a
}
x <- c(1, 2, 3)
z <- f(x)

f2 <- function(a) {
a[1] <- 10
a
}
z2 <- f2(x)
5 / 31

Copy-on-modify: what about inside functions?

f <- function(a) {
a
}
x <- c(1, 2, 3)
z <- f(x)

f2 <- function(a) {
a[1] <- 10
a
}
z2 <- f2(x)
x z2
[1,] 1 10
[2,] 2 2
[3,] 3 3
5 / 31

Lists

It's not just names (i.e. variables) that point to values; elements of lists do too.

l1 <- list(1, 2, 3)

6 / 31

Lists

It's not just names (i.e. variables) that point to values; elements of lists do too.

l1 <- list(1, 2, 3)

l2 <- l1

6 / 31

Copy-on-modify for lists?

l2[[3]] <- 4
7 / 31

Copy-on-modify for lists?

l2[[3]] <- 4

7 / 31

Data frames


Data frames are lists of vectors.


d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))

8 / 31
d2 <- d1
d2[, 2] <- d2[, 2] * 2 # modify one column
9 / 31
d2 <- d1
d2[, 2] <- d2[, 2] * 2 # modify one column

9 / 31
d2 <- d1
d2[, 2] <- d2[, 2] * 2 # modify one column

d3 <- d1
d3[1, ] <- d3[1, ] * 3 # modify one row
9 / 31
d2 <- d1
d2[, 2] <- d2[, 2] * 2 # modify one column

d3 <- d1
d3[1, ] <- d3[1, ] * 3 # modify one row

9 / 31

Part 2: Why loops are slow in R?

10 / 31

Never grow a vector


Example computing the cumulative sums of a vector:


x <- rnorm(10e3)
current_sum <- 0
res <- c()
for (x_i in x) {
current_sum <- current_sum + x_i
res <- c(res, current_sum)
}


Why is this code bad?

11 / 31

12 / 31

Much faster code by pre-allocating


If you know the size of the result in advance, you should pre-allocate it.


current_sum <- 0
res2 <- double(length(x)) # same as rep(0, length(x))
for (i in seq_along(x)) {
current_sum <- current_sum + x[i]
res2[i] <- current_sum
}
13 / 31

Much faster code by using a list


It is okay to grow a list.


current_sum <- 0
res3 <- list()
for (i in seq_along(x)) {
current_sum <- current_sum + x[i]
res3[[i]] <- current_sum
}
unlist(res3)
14 / 31

Much faster code by growing a vector (the right way)


Since v3.4, you can efficiently grow a vector (not with c() though).


current_sum <- 0
res4 <- c()
for (i in seq_along(x)) {
current_sum <- current_sum + x[i]
res4[i] <- current_sum
}
15 / 31

Much faster code by using existing efficient functions


Some base functions are code in C or Fortan, and are very fast. Use them!


res5 <- cumsum(x)
16 / 31

Much faster code by using existing efficient functions


Some base functions are code in C or Fortan, and are very fast. Use them!


res5 <- cumsum(x)


Other examples

  • Using rowMeans(x) is much faster than apply(x, 1, mean).

  • If you want more efficient functions that apply to rows and columns of matrices, you can use package {matrixStats}.

  • When reading large text files, rather than read.table(), prefer using data.table::fread() (or bigreadr::fread2()).

  • Generally, packages that uses C/Rcpp are efficient.

16 / 31

Loops vs sapply() vs vectorization


Three ways to compute the sum of two vectors (element-wise):


add_loop_prealloc <- function(x, y) {
res <- double(length(x))
for (i in seq_along(x)) {
res[i] <- x[i] + y[i]
}
res
}
add_sapply <- function(x, y) {
sapply(seq_along(x), function(i) x[i] + y[i])
}
add_vectorized <- `+`
17 / 31

Benchmark

N <- 10e3; x <- runif(N); y <- rnorm(N)
microbenchmark::microbenchmark(
LOOP = add_loop_prealloc(x, y),
SAPPLY = add_sapply(x, y),
VECTORIZED = add_vectorized(x, y)
)
18 / 31

Benchmark

N <- 10e3; x <- runif(N); y <- rnorm(N)
microbenchmark::microbenchmark(
LOOP = add_loop_prealloc(x, y),
SAPPLY = add_sapply(x, y),
VECTORIZED = add_vectorized(x, y)
)
Unit: microseconds
expr min lq mean median uq max neval
LOOP 576.9 590.35 790.449 604.90 647.05 15708.3 100
SAPPLY 6176.0 6680.45 7576.089 6911.90 7889.40 18345.6 100
VECTORIZED 6.1 7.25 14.731 10.25 19.05 50.6 100


Loops are actually faster than sapply() because they can benefit from just-in-time compilation (JIT, since v3.4).

Vectorization is by far the best.

18 / 31

Why vectorizing?


I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).

19 / 31

Why vectorizing?


I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).

As an interpreted language, for each iteration res[i] <- x[i] + y[i], has to ask:

  • what is the type of x[i] and y[i]?

  • can I add these two types?

  • what is the type of x[i] + y[i] then?

  • can I store this result in res or do I need to convert it?

These questions must be answered for each iteration, which takes time.
Some of this is alleviated by JIT compilation.

19 / 31

Why vectorizing?


I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).

As an interpreted language, for each iteration res[i] <- x[i] + y[i], has to ask:

  • what is the type of x[i] and y[i]?

  • can I add these two types?

  • what is the type of x[i] + y[i] then?

  • can I store this result in res or do I need to convert it?

These questions must be answered for each iteration, which takes time.
Some of this is alleviated by JIT compilation.

On the contrary, for vectorized functions, these questions must be answered only once, which saves a lot of time. Read more with Noam Ross’s blog post on vectorization.

19 / 31

Example of vectorization

Suppose we wish to estimate the integral 10x2dx using a Monte-Carlo method. Essentially, we throw darts at the curve and count the number of darts that fall below the curve (as in the following figure).

20 / 31

How to vectorize this code?


Naive R code implementing this Monte-Carlo algorithm:


monte_carlo <- function(nb_samp) {
hits <- 0
for (i in seq_len(nb_samp)) {
x <- runif(1)
y <- runif(1)
if (y < x^2) hits <- hits + 1
}
hits / nb_samp
}
monte_carlo(1e4)
[1] 0.3362
21 / 31

A better solution

monte_carlo2 <- function(nb_samp) {
all_x <- runif(nb_samp)
all_y <- runif(nb_samp)
hits <- 0
for (i in seq_len(nb_samp)) {
x <- all_x[i]
y <- all_y[i]
if (y < x^2) hits <- hits + 1
}
hits / nb_samp
}
22 / 31

A better solution

monte_carlo3 <- function(nb_samp) {
all_x <- runif(nb_samp)
all_y <- runif(nb_samp)
test <- all_y < all_x^2
hits <- 0
for (i in seq_len(nb_samp)) {
if (test[i]) hits <- hits + 1
}
hits / nb_samp
}
23 / 31

An even better solution

monte_carlo4 <- function(nb_samp) {
all_x <- runif(nb_samp)
all_y <- runif(nb_samp)
test <- all_y < all_x^2
mean(test)
}
24 / 31

An even better solution

monte_carlo4 <- function(nb_samp) {
all_x <- runif(nb_samp)
all_y <- runif(nb_samp)
test <- all_y < all_x^2
mean(test)
}
monte_carlo5 <- function(nb_samp) {
mean(runif(nb_samp) < runif(nb_samp)^2)
}
24 / 31

An even better solution

monte_carlo4 <- function(nb_samp) {
all_x <- runif(nb_samp)
all_y <- runif(nb_samp)
test <- all_y < all_x^2
mean(test)
}
monte_carlo5 <- function(nb_samp) {
mean(runif(nb_samp) < runif(nb_samp)^2)
}
c(monte_carlo (1e6),
monte_carlo2(1e6),
monte_carlo3(1e6),
monte_carlo4(1e6),
monte_carlo5(1e6))
[1] 0.333325 0.333494 0.333562 0.333586 0.333303
24 / 31

Benchmark

microbenchmark::microbenchmark(
monte_carlo (1e4),
monte_carlo2(1e4),
monte_carlo3(1e4),
monte_carlo4(1e4),
monte_carlo5(1e4)
)
Unit: microseconds
expr min lq mean median uq
monte_carlo(10000) 27217.6 35832.90 41264.791 38630.05 44239.40
monte_carlo2(10000) 1201.4 1247.15 1551.642 1335.40 1927.65
monte_carlo3(10000) 847.1 895.60 1158.774 967.75 1234.60
monte_carlo4(10000) 588.9 622.60 783.828 659.80 788.75
monte_carlo5(10000) 570.1 589.15 782.959 641.65 930.20
max neval
87908.6 100
2513.9 100
8156.3 100
5411.4 100
4538.2 100
25 / 31

Part 3: Other strategies

to make your code faster

26 / 31

Identify where your code is slow


"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered."

-- Donald Knuth.


27 / 31

Identify where your code is slow


"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered."

-- Donald Knuth.


Trying to optimize each and every part of your code
time lost + code too complex

R is great at prototyping quickly; always start with that!
If performance matters, then profile your code to see which part of your code is taking too much time and optimize only this part!

Learn more on how to profile your code in RStudio in this article.

27 / 31

Let us profile the code we used before


x <- rnorm(50e3)
current_sum <- 0
res <- c()
for (x_i in x) {
current_sum <- current_sum + x_i
res <- c(res, current_sum)
}


In RStudio, 'Profile' panel Profile Selected Line(s).

28 / 31

Rcpp: making R functions from C++ code

Rcpp lives between R and C++, so that you can get

  • the performance of C++,

  • the convenience of R.

29 / 31

Rcpp: making R functions from C++ code

Rcpp lives between R and C++, so that you can get

  • the performance of C++,

  • the convenience of R.

Typical bottlenecks that C++ can address include:

  • Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than that in R.

  • Loops that can’t be easily vectorized because subsequent iterations depend on previous ones.

  • Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. See this chapter.


To learn more, have a look at my presentation on Rcpp.

29 / 31

Parallelization


For parallelizing R code, I basically always use foreach and recommend to do so. See my guide to parallelism in R with foreach.


Just remember to optimize your code before trying to parallelize it.

30 / 31

Thanks!


Presentation available at bit.ly/RUGgre38


privefl

Slides created via the R package xaringan

31 / 31

Part 1: Memory management in


Read more with this chapter of Advanced R

2 / 31
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow