x <- c(1, 2, 3)
c(1, 2, 3)
.x
.x <- c(1, 2, 3)
c(1, 2, 3)
.x
.y <- x
x <- c(1, 2, 3)y <- xy[3] <- 4
x <- c(1, 2, 3)y <- xy[3] <- 4
x
[1] 1 2 3
x <- c(1, 2, 3)y <- xy[3] <- 4
x
[1] 1 2 3
f <- function(a) { a}x <- c(1, 2, 3)z <- f(x)
f <- function(a) { a}x <- c(1, 2, 3)z <- f(x)
f <- function(a) { a}x <- c(1, 2, 3)z <- f(x)
f2 <- function(a) { a[1] <- 10 a}z2 <- f2(x)
f <- function(a) { a}x <- c(1, 2, 3)z <- f(x)
f2 <- function(a) { a[1] <- 10 a}z2 <- f2(x)
x z2[1,] 1 10[2,] 2 2[3,] 3 3
It's not just names (i.e. variables) that point to values; elements of lists do too.
l1 <- list(1, 2, 3)
It's not just names (i.e. variables) that point to values; elements of lists do too.
l1 <- list(1, 2, 3)
l2 <- l1
l2[[3]] <- 4
l2[[3]] <- 4
Data frames are lists of vectors.
d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
d2 <- d1d2[, 2] <- d2[, 2] * 2 # modify one column
d2 <- d1d2[, 2] <- d2[, 2] * 2 # modify one column
d2 <- d1d2[, 2] <- d2[, 2] * 2 # modify one column
d3 <- d1d3[1, ] <- d3[1, ] * 3 # modify one row
d2 <- d1d2[, 2] <- d2[, 2] * 2 # modify one column
d3 <- d1d3[1, ] <- d3[1, ] * 3 # modify one row
Example computing the cumulative sums of a vector:
x <- rnorm(10e3)current_sum <- 0res <- c()for (x_i in x) { current_sum <- current_sum + x_i res <- c(res, current_sum)}
Why is this code bad?
If you know the size of the result in advance, you should pre-allocate it.
current_sum <- 0res2 <- double(length(x)) # same as rep(0, length(x))for (i in seq_along(x)) { current_sum <- current_sum + x[i] res2[i] <- current_sum}
It is okay to grow a list.
current_sum <- 0res3 <- list()for (i in seq_along(x)) { current_sum <- current_sum + x[i] res3[[i]] <- current_sum}unlist(res3)
Since v3.4, you can efficiently grow a vector (not with c()
though).
current_sum <- 0res4 <- c()for (i in seq_along(x)) { current_sum <- current_sum + x[i] res4[i] <- current_sum}
Some base functions are code in C or Fortan, and are very fast. Use them!
res5 <- cumsum(x)
Some base functions are code in C or Fortan, and are very fast. Use them!
res5 <- cumsum(x)
Using rowMeans(x)
is much faster than apply(x, 1, mean)
.
If you want more efficient functions that apply to rows and columns of matrices, you can use package {matrixStats}.
When reading large text files, rather than read.table()
, prefer using data.table::fread()
(or bigreadr::fread2()
).
Generally, packages that uses C/Rcpp are efficient.
Three ways to compute the sum of two vectors (element-wise):
add_loop_prealloc <- function(x, y) { res <- double(length(x)) for (i in seq_along(x)) { res[i] <- x[i] + y[i] } res}add_sapply <- function(x, y) { sapply(seq_along(x), function(i) x[i] + y[i])}add_vectorized <- `+`
N <- 10e3; x <- runif(N); y <- rnorm(N)microbenchmark::microbenchmark( LOOP = add_loop_prealloc(x, y), SAPPLY = add_sapply(x, y), VECTORIZED = add_vectorized(x, y))
N <- 10e3; x <- runif(N); y <- rnorm(N)microbenchmark::microbenchmark( LOOP = add_loop_prealloc(x, y), SAPPLY = add_sapply(x, y), VECTORIZED = add_vectorized(x, y))
Unit: microseconds expr min lq mean median uq max neval LOOP 576.9 590.35 790.449 604.90 647.05 15708.3 100 SAPPLY 6176.0 6680.45 7576.089 6911.90 7889.40 18345.6 100 VECTORIZED 6.1 7.25 14.731 10.25 19.05 50.6 100
Loops are actually faster than sapply()
because they can benefit from just-in-time compilation (JIT, since v3.4).
Vectorization is by far the best.
I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).
I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).
As an interpreted language, for each iteration res[i] <- x[i] + y[i]
, has to ask:
what is the type of x[i]
and y[i]
?
can I add these two types?
what is the type of x[i] + y[i]
then?
can I store this result in res
or do I need to convert it?
These questions must be answered for each iteration, which takes time.
Some of this is alleviated by JIT compilation.
I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C, C++ and Fortran).
As an interpreted language, for each iteration res[i] <- x[i] + y[i]
, has to ask:
what is the type of x[i]
and y[i]
?
can I add these two types?
what is the type of x[i] + y[i]
then?
can I store this result in res
or do I need to convert it?
These questions must be answered for each iteration, which takes time.
Some of this is alleviated by JIT compilation.
On the contrary, for vectorized functions, these questions must be answered only once, which saves a lot of time. Read more with Noam Ross’s blog post on vectorization.
Suppose we wish to estimate the integral ∫10x2dx using a Monte-Carlo method. Essentially, we throw darts at the curve and count the number of darts that fall below the curve (as in the following figure).
Naive R code implementing this Monte-Carlo algorithm:
monte_carlo <- function(nb_samp) { hits <- 0 for (i in seq_len(nb_samp)) { x <- runif(1) y <- runif(1) if (y < x^2) hits <- hits + 1 } hits / nb_samp}
monte_carlo(1e4)
[1] 0.3362
monte_carlo2 <- function(nb_samp) { all_x <- runif(nb_samp) all_y <- runif(nb_samp) hits <- 0 for (i in seq_len(nb_samp)) { x <- all_x[i] y <- all_y[i] if (y < x^2) hits <- hits + 1 } hits / nb_samp}
monte_carlo3 <- function(nb_samp) { all_x <- runif(nb_samp) all_y <- runif(nb_samp) test <- all_y < all_x^2 hits <- 0 for (i in seq_len(nb_samp)) { if (test[i]) hits <- hits + 1 } hits / nb_samp}
monte_carlo4 <- function(nb_samp) { all_x <- runif(nb_samp) all_y <- runif(nb_samp) test <- all_y < all_x^2 mean(test)}
monte_carlo4 <- function(nb_samp) { all_x <- runif(nb_samp) all_y <- runif(nb_samp) test <- all_y < all_x^2 mean(test)}
monte_carlo5 <- function(nb_samp) { mean(runif(nb_samp) < runif(nb_samp)^2)}
monte_carlo4 <- function(nb_samp) { all_x <- runif(nb_samp) all_y <- runif(nb_samp) test <- all_y < all_x^2 mean(test)}
monte_carlo5 <- function(nb_samp) { mean(runif(nb_samp) < runif(nb_samp)^2)}
c(monte_carlo (1e6), monte_carlo2(1e6), monte_carlo3(1e6), monte_carlo4(1e6), monte_carlo5(1e6))
[1] 0.333325 0.333494 0.333562 0.333586 0.333303
microbenchmark::microbenchmark( monte_carlo (1e4), monte_carlo2(1e4), monte_carlo3(1e4), monte_carlo4(1e4), monte_carlo5(1e4))
Unit: microseconds expr min lq mean median uq monte_carlo(10000) 27217.6 35832.90 41264.791 38630.05 44239.40 monte_carlo2(10000) 1201.4 1247.15 1551.642 1335.40 1927.65 monte_carlo3(10000) 847.1 895.60 1158.774 967.75 1234.60 monte_carlo4(10000) 588.9 622.60 783.828 659.80 788.75 monte_carlo5(10000) 570.1 589.15 782.959 641.65 930.20 max neval 87908.6 100 2513.9 100 8156.3 100 5411.4 100 4538.2 100
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered."
-- Donald Knuth.
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered."
-- Donald Knuth.
Trying to optimize each and every part of your code
⟹ time lost + code too complex
R is great at prototyping quickly; always start with that!
If performance matters, then profile your code to see which part of your code is taking too much time and optimize only this part!
Learn more on how to profile your code in RStudio in this article.
x <- rnorm(50e3)current_sum <- 0res <- c()for (x_i in x) { current_sum <- current_sum + x_i res <- c(res, current_sum)}
In RStudio, 'Profile' panel → Profile Selected Line(s).
Rcpp lives between R and C++, so that you can get
the performance of C++,
the convenience of R.
Rcpp lives between R and C++, so that you can get
the performance of C++,
the convenience of R.
Typical bottlenecks that C++ can address include:
Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than that in R.
Loops that can’t be easily vectorized because subsequent iterations depend on previous ones.
Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. See this chapter.
To learn more, have a look at my presentation on Rcpp.
For parallelizing R code, I basically always use foreach
and recommend to do so. See my guide to parallelism in R with foreach
.
Just remember to optimize your code before trying to parallelize it.
Presentation available at bit.ly/RUGgre38
privefl
Slides created via the R package xaringan
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |