Quality Control based on Identity-by-descent (IBD) computed by PLINK 1.9 using its method-of-moments.
snp_plinkIBDQC(
plink.path,
bedfile.in,
bedfile.out = NULL,
pi.hat = 0.08,
ncores = 1,
pruning.args = c(100, 0.2),
do.blind.QC = TRUE,
extra.options = "",
verbose = TRUE
)Path to the executable of PLINK 1.9.
Path to the input bedfile.
Path to the output bedfile. Default is created by
appending "_norel" to prefix.in (bedfile.in without extension).
PI_HAT value threshold for individuals (first by pairs)
to be excluded. Default is 0.08.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores().
A vector of 2 pruning parameters, respectively
the window size (in variant count) and the pairwise $r^2$ threshold
(the step size is fixed to 1). Default is c(100, 0.2).
Whether to do QC with pi.hat without visual inspection.
Default is TRUE. If FALSE, return the data.frame of the corresponding
".genome" file without doing QC. One could use
ggplot2::qplot(Z0, Z1, data = mydf, col = RT) for visual inspection.
Other options to be passed to PLINK as a string (for the IBD part). More options can be found at https://www.cog-genomics.org/plink/1.9/ibd.
Whether to show PLINK log? Default is TRUE.
The path of the new bedfile. If no sample is filter, no new bed/bim/fam files are created and then the path of the input bedfile is returned.
Chang, Christopher C, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4 (1): 7. doi:10.1186/s13742-015-0047-8 .
if (FALSE) {
bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
plink <- download_plink()
bedfile <- snp_plinkIBDQC(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
ncores = 2)
df_rel <- snp_plinkIBDQC(plink, bedfile, do.blind.QC = FALSE, ncores = 2)
str(df_rel)
library(ggplot2)
qplot(Z0, Z1, data = df_rel, col = RT)
qplot(y = PI_HAT, data = df_rel) +
geom_hline(yintercept = 0.2, color = "blue", linetype = 2)
snp_plinkRmSamples(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
df.or.files = subset(df_rel, PI_HAT > 0.2))
}