A Split-Apply-Combine strategy to parallelize the evaluation of a function.

```
big_parallelize(
X,
p.FUN,
p.combine = NULL,
ind = cols_along(X),
ncores = nb_cores(),
...
)
```

- X
An object of class FBM.

- p.FUN
The function to be applied to each subset matrix. It must take a Filebacked Big Matrix as first argument and

`ind`

, a vector of indices, which are used to split the data. For example, if you want to apply a function to`X[ind.row, ind.col]`

, you may use`X[ind.row, ind.col[ind]]`

in`a.FUN`

.- p.combine
Function to combine the results with

`do.call`

. This function should accept multiple arguments (`...`

). For example, you can use`c`

,`cbind`

,`rbind`

. This package also provides function`plus`

to add multiple arguments together. The default is`NULL`

, in which case the results are not combined and are returned as a list, each element being the result of a block.- ind
Initial vector of subsetting indices. Default is the vector of all column indices.

- ncores
Number of cores used. Default doesn't use parallelism. You may use nb_cores.

- ...
Extra arguments to be passed to

`p.FUN`

.

Return a list of `ncores`

elements, each element being the result of
one of the cores, computed on a block. The elements of this list are then
combined with `do.call(p.combine, .)`

if `p.combined`

is given.

This function splits indices in parts, then apply a given function to each part and finally combine the results.

```
if (FALSE) # CRAN is super slow when parallelism.
X <- big_attachExtdata()
### Computation on all the matrix
true <- big_colstats(X)
#> Error in as.list.environment(parent.frame()): object 'X' not found
big_colstats_sub <- function(X, ind) {
big_colstats(X, ind.col = ind)
}
# 1. the computation is split along all the columns
# 2. for each part the computation is done, using `big_colstats`
# 3. the results (data.frames) are combined via `rbind`.
test <- big_parallelize(X, p.FUN = big_colstats_sub,
p.combine = 'rbind', ncores = 2)
#> Error in ncol(x): object 'X' not found
all.equal(test, true)
#> Error in all.equal(test, true): object 'test' not found
### Computation on a part of the matrix
n <- nrow(X)
#> Error in nrow(X): object 'X' not found
m <- ncol(X)
#> Error in ncol(X): object 'X' not found
rows <- sort(sample(n, n/2)) # sort to provide some locality in accesses
#> Error in sample(n, n/2): object 'n' not found
cols <- sort(sample(m, m/2)) # idem
#> Error in sample(m, m/2): object 'm' not found
true2 <- big_colstats(X, ind.row = rows, ind.col = cols)
#> Error in as.list.environment(parent.frame()): object 'X' not found
big_colstats_sub2 <- function(X, ind, rows, cols) {
big_colstats(X, ind.row = rows, ind.col = cols[ind])
}
# This doesn't work because, by default, the computation is spread
# along all columns. We must explictly specify the `ind` parameter.
tryCatch(big_parallelize(X, p.FUN = big_colstats_sub2,
p.combine = 'rbind', ncores = 2,
rows = rows, cols = cols),
error = function(e) message(e))
#> Error in ncol(x): object 'X' not found
# This now works, using `ind = seq_along(cols)`.
test2 <- big_parallelize(X, p.FUN = big_colstats_sub2,
p.combine = 'rbind', ncores = 2,
ind = seq_along(cols),
rows = rows, cols = cols)
#> Error in assert_one_int(total_len): object 'cols' not found
all.equal(test2, true2)
#> Error in all.equal(test2, true2): object 'test2' not found
```