Quality Control (QC) and possible conversion to bed/bim/fam files using PLINK 1.9.

snp_plinkQC(
  plink.path,
  prefix.in,
  file.type = "--bfile",
  prefix.out = paste0(prefix.in, "_QC"),
  maf = 0.01,
  geno = 0.1,
  mind = 0.1,
  hwe = 1e-50,
  autosome.only = FALSE,
  extra.options = "",
  verbose = TRUE
)

Arguments

plink.path

Path to the executable of PLINK 1.9.

prefix.in

Prefix (path without extension) of the dataset to be QCed.

file.type

Type of the dataset to be QCed. Default is "--bfile" and corresponds to bed/bim/fam files. You can also use "--file" for ped/map files, "--vcf" for a VCF file, or "--gzvcf" for a gzipped VCF. More information can be found at https://www.cog-genomics.org/plink/1.9/input.

prefix.out

Prefix (path without extension) of the bed/bim/fam dataset to be created. Default is created by appending "_QC" to prefix.in.

maf

Minimum Minor Allele Frequency (MAF) for a SNP to be kept. Default is 0.01.

geno

Maximum proportion of missing values for a SNP to be kept. Default is 0.1.

mind

Maximum proportion of missing values for a sample to be kept. Default is 0.1.

hwe

Filters out all variants which have Hardy-Weinberg equilibrium exact test p-value below the provided threshold. Default is 1e-50.

autosome.only

Whether to exclude all unplaced and non-autosomal variants? Default is FALSE.

extra.options

Other options to be passed to PLINK as a string. More options can be found at https://www.cog-genomics.org/plink2/filter. If using PLINK 2.0, you could e.g. use "--king-cutoff 0.0884" to remove some related samples at the same time of quality controls.

verbose

Whether to show PLINK log? Default is TRUE.

Value

The path of the newly created bedfile.

References

Chang, Christopher C, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4 (1): 7. doi:10.1186/s13742-015-0047-8 .

Examples

if (FALSE) {

bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
prefix  <- sub_bed(bedfile)
plink <- download_plink()
test <- snp_plinkQC(plink.path = plink,
                    prefix.in = prefix,
                    prefix.out = tempfile(),
                    file.type = "--bfile",  # the default (for ".bed")
                    maf = 0.05,
                    geno = 0.05,
                    mind = 0.05,
                    hwe = 1e-10,
                    autosome.only = TRUE)
test
}