Why loops are slow in R

Written on June 11, 2018

In this post, I talk about loops in R, why they can be slow and when it is okay to use them.

Don’t grow objects

Let us generate a matrix of uniform values (max changing for every column).

gen_grow <- function(n = 1e3, max = 1:500) {
  mat <- NULL
  for (m in max) {
    mat <- cbind(mat, runif(n, max = m))
  }
  mat
}

set.seed(1)
system.time(mat1 <- gen_grow(max = 1:500))

##    user  system elapsed 
##   0.333   0.189   0.523

system.time(mat2 <- gen_grow(max = 1:2000))

##    user  system elapsed 
##   6.183   7.603  13.803

gen_sapply <- function(n = 1e3, max = 1:500) {
  sapply(max, function(m) runif(n, max = m))
}

set.seed(1)
system.time(mat3 <- gen_sapply(max = 1:500))

##    user  system elapsed 
##   0.026   0.005   0.030

identical(mat3, mat1)

## [1] TRUE

system.time(mat4 <- gen_sapply(max = 1:2000))

##    user  system elapsed 
##   0.108   0.014   0.122

identical(mat4, mat2)

## [1] TRUE

Wow, sapply() is so much faster than loops!

Don’t get this wrong, sapply() or lapply() is nothing but a loop internally, so sapply() shouldn’t be any faster than a loop. Here, the problem is not with the loop, but what we do inside this loop. Indeed, in gen_grow(), at each iteration of the loop, we reallocate a new matrix with one more column, which takes time.

Imagine you want to climb all those stairs, but you have to climb only stair 1, go to the bottom then climb the first 2 stairs, go to the bottom then climb the first three, and so on until you reach the top. This takes way more time than just climbing all stairs at once. This is basically what happens in function gen_grow() but instead of climbing more stairs, it allocates more memory, which also takes time.

You have at least two solutions to this problem. The first solution is to pre-allocate the whole result once (if you know its size in advance) and just fill it:

gen_prealloc <- function(n = 1e3, max = 1:500) {
  mat <- matrix(0, n, length(max))
  for (i in seq_along(max)) {
    mat[, i] <- runif(n, max = max[i])
  }
  mat
}

set.seed(1)
system.time(mat5 <- gen_prealloc(max = 1:500))

##    user  system elapsed 
##   0.030   0.000   0.031

identical(mat5, mat1)

## [1] TRUE

system.time(mat6 <- gen_prealloc(max = 1:2000))

##    user  system elapsed 
##   0.101   0.009   0.109

identical(mat6, mat2)

## [1] TRUE

Another solution that can be really useful if you don’t know the size of the result is to store the results in a list. A list, as opposed to a vector or a matrix, stores its elements in different places in memory (the elements don’t have to be contiguously stored in memory) so that you can add one element to the list without copying the rest of the list.

gen_list <- function(n = 1e3, max = 1:500) {
  l <- list()
  for (i in seq_along(max)) {
    l[[i]] <- runif(n, max = max[i])
  }
  do.call("cbind", l)
}

set.seed(1)
system.time(mat7 <- gen_list(max = 1:500))

##    user  system elapsed 
##   0.028   0.000   0.028

identical(mat7, mat1)

## [1] TRUE

system.time(mat8 <- gen_list(max = 1:2000))

##    user  system elapsed 
##   0.098   0.006   0.105

identical(mat8, mat2)

## [1] TRUE

Vectorization, why?

I call vectorized a function that takes vectors as arguments and operate on each element of these vectors in another (compiled) language (such as C++ and Fortran).

So, let me repeat myself: sapply() is not a vectorized function.

Let’s go back to vectorization, why is it so important in R? As an example, let’s compute the sum of two vectors.

add_loop_prealloc <- function(x, y) {
  res <- double(length(x))
  for (i in seq_along(x)) {
    res[i] <- x[i] + y[i]
  }
  res
}

add_sapply <- function(x, y) {
  sapply(seq_along(x), function(i) x[i] + y[i])
}

add_vectorized <- `+`

N <- 1e5; x <- runif(N); y <- rnorm(N)

compiler::enableJIT(0)  ## disable just-in-time compilation

## [1] 3

microbenchmark::microbenchmark(
        LOOP = add_loop_prealloc(x, y),
      SAPPLY = add_sapply(x, y),
  VECTORIZED = add_vectorized(x, y)
)

## Unit: microseconds
##        expr        min          lq        mean      median         uq        max neval
##        LOOP 116724.093 137378.5615 156935.0679 149568.2770 165776.978 281091.814   100
##      SAPPLY 125235.495 146122.2815 174867.8669 169495.3810 192442.822 337318.547   100
##  VECTORIZED    109.899    212.5135    247.1062    240.1635    260.046    574.126   100

compiler::enableJIT(3)  ## default

## [1] 0

microbenchmark::microbenchmark(
        LOOP = add_loop_prealloc(x, y),
      SAPPLY = add_sapply(x, y),
  VECTORIZED = add_vectorized(x, y)
)

## Unit: microseconds
##        expr       min         lq        mean      median          uq        max neval
##        LOOP  8762.638  10646.571  12672.2114  11206.1365  12772.3280  35368.977   100
##      SAPPLY 82463.586 116025.847 148921.6490 142601.9835 171450.0465 266249.684   100
##  VECTORIZED   123.004    219.397    320.1623    259.7455    384.3415    683.014   100

Here, the vectorized function is much faster than the two others and the for-loop approach is faster than the sapply equivalent when just-in-time compilation is enabled.

As an interpreted language, for each iteration res[i] <- x[i] + y[i], R has to ask:

what is the type of x[i] and y[i]?
can I add these two types? what is the type of x[i] + y[i] then?
can I store this result in res or do I need to convert it?

These questions must be answered for each iteration, which takes time. On the contrary, for vectorized functions, these questions must be answered only once, which saves a lot of time. Read more with Noam Ross’s blog post on vectorization.

Conclusion

In this post, I don’t say that you shouldn’t use lapply() instead of a for-loop. Indeed, it can be more concise and clearer to use lapply(), but don’t expect miracles with respect to performance. You should also take a look at package {purrr} that provides shortcuts, consistency and some functions to iterate over rows of a data frame.
Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don’t use too many iterations (let’s say not more than 100,000 iterations).
Beware what you’re doing in the loops because they can be super slow. Use vectorized operations if you can (search for them in available packages such as {matrixStats}). If you can’t, write your own vectorized functions with {Rcpp}. I have an introduction to {Rcpp} there.