bignspr is an R Package for the analysis of massive SNP arrays. It enhances the features of package bigstatsr for the purpose of analysing genotype data.

LIST OF FEATURES

## This package is in beta testing

Any bug report is welcomed.

## Installation

For now, you can install this package using

devtools::install_github("privefl/bigsnpr")

## Input format

For now, this package only read bed/bim/fam files (PLINK preferred format) using snp_readBed. Before reading into this package’s special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC and snp_plinkIBDQC.

I use a class called bigSNP for representing infos on massive SNP arrays. One bigSNP has at least 3 elements:

• genotypes: A BM.code.descriptor which describes a special big.matrix (see package bigstatsr). Rows are samples and columns are SNPs. This corresponds to the “bed” file, but each element is encoded on 8 bits rather than only 2 bits for PLINK files, which allows for storing more information, without taking too much disk space.
• fam: A data.frame containing some information on the SNPs (read from the “.fam” file).
• map: A data.frame giving some information on the individuals (read from the “.bim” file).

## Possible upcoming features

• Support for other input formats. Note that there is room for coding allele dosages (rounded to two decimal places). See bigsnpr:::CODE_DOSAGE.
• Fast imputation algorithm which doesn’t require reference panels.
• Imputation probabilities and multiple imputation.
• An interactive QC procedure (call rates, difference of missingness between cases and controls, MAF cutoff, relatedness, HWE, autosomal only, others?).
• proper integration of haploid species.

## Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.