big_parallelize.Rd
A SplitApplyCombine strategy to parallelize the evaluation of a function.
big_parallelize( X, p.FUN, p.combine = NULL, ind = cols_along(X), ncores = nb_cores(), ... )
X  A FBM. 

p.FUN  The function to be applied to each subset matrix.
It must take a Filebacked Big Matrix as first argument and

p.combine  Function to combine the results with 
ind  Initial vector of subsetting indices. Default is the vector of all column indices. 
ncores  Number of cores used. Default doesn't use parallelism. You may use nb_cores. 
...  Extra arguments to be passed to 
Return a list of ncores
elements, each element being the result of
one of the cores, computed on a block. The elements of this list are then
combined with do.call(p.combine, .)
if p.combined
is given.
This function splits indices in parts, then apply a given function to each part and finally combine the results.
if (FALSE) # CRAN is super slow when parallelism. X < big_attachExtdata() ### Computation on all the matrix true < big_colstats(X)#> Error in bigcolvars(X, ind.row, ind.col): object 'X' not foundbig_colstats_sub < function(X, ind) { big_colstats(X, ind.col = ind) } # 1. the computation is split along all the columns # 2. for each part the computation is done, using `big_colstats` # 3. the results (data.frames) are combined via `rbind`. test < big_parallelize(X, p.FUN = big_colstats_sub, p.combine = 'rbind', ncores = 2)#> Error in ncol(x): object 'X' not found#> Error in all.equal(test, true): object 'test' not found#> Error in nrow(X): object 'X' not found#> Error in ncol(X): object 'X' not found#> Error in sample(n, n/2): object 'n' not found#> Error in sample(m, m/2): object 'm' not found#> Error in bigcolvars(X, ind.row, ind.col): object 'X' not foundbig_colstats_sub2 < function(X, ind, rows, cols) { big_colstats(X, ind.row = rows, ind.col = cols[ind]) } # This doesn't work because, by default, the computation is spread # along all columns. We must explictly specify the `ind` parameter. tryCatch(big_parallelize(X, p.FUN = big_colstats_sub2, p.combine = 'rbind', ncores = 2, rows = rows, cols = cols), error = function(e) message(e))#> Error in ncol(x): object 'X' not found# This now works, using `ind = seq_along(cols)`. test2 < big_parallelize(X, p.FUN = big_colstats_sub2, p.combine = 'rbind', ncores = 2, ind = seq_along(cols), rows = rows, cols = cols)#> Error in assert_one_int(total_len): object 'cols' not found#> Error in all.equal(test2, true2): object 'test2' not found