R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette). As inputs, package {bigstatsr} uses Filebacked Big Matrices (FBM).

LIST OF FEATURES

Note that most of the algorithms of this package don’t handle missing values.

Installation

Some use cases

Parallelization

Package {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with this tutorial.

Bug report / Help

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue as well or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

  • Privé, Florian, et al. “Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr.” Bioinformatics 34.16 (2018): 2781-2787.

  • Privé, Florian, Hugues Aschard, and Michael GB Blum. “Efficient implementation of penalized regression for genetic risk prediction.” Genetics 212.1 (2019): 65-74.