bignspr is an R Package for the analysis of massive SNP arrays. It enhances the features of package bigstatsr for the purpose of analysing genotype data.



For now, you can install this package using


Input format

For now, this package only read bed/bim/fam files (PLINK preferred format) using snp_readBed. Before reading into this package’s special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC and snp_plinkIBDQC.

I use a class called bigSNP for representing infos on massive SNP arrays. One bigSNP has at least 3 elements:

  • genotypes: A BM.code.descriptor which describes a special big.matrix (see package bigstatsr). Rows are samples and columns are SNPs. This corresponds to the “bed” file, but each element is encoded on 8 bits rather than only 2 bits for PLINK files, which allows for storing more information, without taking too much disk space.
  • fam: A data.frame containing some information on the SNPs (read from the “.fam” file).
  • map: A data.frame giving some information on the individuals (read from the “.bim” file).

Possible upcoming features

  • Support for other input formats. Note that there is room for coding allele dosages (rounded to two decimal places). See bigsnpr:::CODE_DOSAGE.
  • Fast imputation algorithm which doesn’t require reference panels.
  • Imputation probabilities and multiple imputation.
  • An interactive QC procedure (call rates, difference of missingness between cases and controls, MAF cutoff, relatedness, HWE, autosomal only, others?).
  • proper integration of haploid species.

Bug report

Please open an issue if you find a bug. If you want help using bigmemory or bigstatsr, please post on Stack Overflow with the tag r-bigmemory. How to make a great R reproducible example?

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.