Identity-by-descent

Quality Control based on Identity-by-descent (IBD) computed by PLINK 1.9 using its method-of-moments.

snp_plinkIBDQC(
  plink.path,
  bedfile.in,
  bedfile.out = NULL,
  pi.hat = 0.08,
  ncores = 1,
  pruning.args = c(100, 0.2),
  do.blind.QC = TRUE,
  extra.options = "",
  verbose = TRUE
)

Arguments

plink.path: Path to the executable of PLINK 1.9.
bedfile.in: Path to the input bedfile.
bedfile.out: Path to the output bedfile. Default is created by appending "_norel" to prefix.in (bedfile.in without extension).
pi.hat: PI_HAT value threshold for individuals (first by pairs) to be excluded. Default is 0.08.
ncores: Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().
pruning.args: A vector of 2 pruning parameters, respectively the window size (in variant count) and the pairwise $r^2$ threshold (the step size is fixed to 1). Default is c(100, 0.2).
do.blind.QC: Whether to do QC with pi.hat without visual inspection. Default is TRUE. If FALSE, return the data.frame of the corresponding ".genome" file without doing QC. One could use ggplot2::qplot(Z0, Z1, data = mydf, col = RT) for visual inspection.
extra.options: Other options to be passed to PLINK as a string (for the IBD part). More options can be found at https://www.cog-genomics.org/plink/1.9/ibd.
verbose: Whether to show PLINK log? Default is TRUE.

Value

The path of the new bedfile. If no sample is filter, no new bed/bim/fam files are created and then the path of the input bedfile is returned.

References

Chang, Christopher C, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4 (1): 7. doi:10.1186/s13742-015-0047-8 .

Examples

if (FALSE) {

bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
plink <- download_plink()

bedfile <- snp_plinkIBDQC(plink, bedfile,
                          bedfile.out = tempfile(fileext = ".bed"),
                          ncores = 2)

df_rel <- snp_plinkIBDQC(plink, bedfile, do.blind.QC = FALSE, ncores = 2)
str(df_rel)

library(ggplot2)
qplot(Z0, Z1, data = df_rel, col = RT)
qplot(y = PI_HAT, data = df_rel) +
  geom_hline(yintercept = 0.2, color = "blue", linetype = 2)
snp_plinkRmSamples(plink, bedfile,
                   bedfile.out = tempfile(fileext = ".bed"),
                   df.or.files = subset(df_rel, PI_HAT > 0.2))
}

Arguments

Value

References

See also

Examples