Split-parApply-Combine — snp

A Split-Apply-Combine strategy to parallelize the evaluation of a function on each SNP, independently.

snp_split(infos.chr, FUN, combine, ncores = 1, ...)

Arguments

infos.chr: Vector of integers specifying each SNP's chromosome.
Typically <bigSNP>$map$chromosome.
FUN: The function to be applied. It must take a FBM.code256 as first argument and ind.chr, an another argument to provide subsetting over SNPs. You can access the number of the chromosome by using attr(ind.chr, "chr").
combine: function that is used by foreach::foreach to process the tasks results as they generated. This can be specified as either a function or a non-empty character string naming the function. Specifying 'c' is useful for concatenating the results into a vector, for example. The values 'cbind' and 'rbind' can combine vectors into a matrix. The values '+' and '*' can be used to process numeric data. By default, the results are returned in a list.
ncores: Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().
...: Extra arguments to be passed to FUN.

Value

The result of foreach::foreach.

Details

This function splits indices for each chromosome, then apply a given function to each part (chromosome) and finally combine the results.

Examples

# parallelize over chromosomes made easy
# examples of functions from this package
snp_pruning
#> function (G, infos.chr, ind.row = rows_along(G), size = 49, is.size.in.bp = FALSE, 
#>     infos.pos = NULL, thr.r2 = 0.2, exclude = NULL, nploidy = 2, 
#>     ncores = 1) 
#> {
#>     stop2("Pruning is deprecated; please use clumping (on MAF) instead..\n%s", 
#>         "See why at https://bit.ly/2uKo3MN.")
#> }
#> <bytecode: 0x000001b46088c570>
#> <environment: namespace:bigsnpr>
snp_clumping
#> function (G, infos.chr, ind.row = rows_along(G), S = NULL, thr.r2 = 0.2, 
#>     size = 100/thr.r2, infos.pos = NULL, is.size.in.bp = NULL, 
#>     exclude = NULL, ncores = 1) 
#> {
#>     check_args()
#>     if (!missing(is.size.in.bp)) 
#>         warning2("Parameter 'is.size.in.bp' is deprecated.")
#>     if (!is.null(S)) 
#>         assert_lengths(infos.chr, S)
#>     ind.noexcl <- setdiff(seq_along(infos.chr), exclude)
#>     sort(unlist(lapply(split(ind.noexcl, infos.chr[ind.noexcl]), 
#>         function(ind.chr) {
#>             clumpingChr(G, S, ind.chr, ind.row, size, infos.pos, 
#>                 thr.r2, ncores)
#>         }), use.names = FALSE))
#> }
#> <bytecode: 0x000001b466b21ec8>
#> <environment: namespace:bigsnpr>
snp_fastImpute
#> function (Gna, infos.chr, alpha = 1e-04, size = 200, p.train = 0.8, 
#>     n.cor = nrow(Gna), seed = NA, ncores = 1) 
#> {
#>     check_args(infos.chr = "assert_lengths(infos.chr, cols_along(Gna))")
#>     assert_package("xgboost")
#>     X <- Gna$copy(code = CODE_IMPUTE_LABEL)
#>     X2 <- Gna$copy(code = CODE_IMPUTE_PRED)
#>     infos.imp <- FBM_infos(Gna)
#>     ind.chrs <- split(seq_along(infos.chr), infos.chr)
#>     for (ind in ind.chrs) {
#>         imputeChr(X, X2, infos.imp, ind, alpha, size, p.train, 
#>             n.cor, seed, ncores)
#>     }
#>     infos.imp
#> }
#> <bytecode: 0x000001b4604673c8>
#> <environment: namespace:bigsnpr>