Quality Control based on Identity-by-descent (IBD) computed by PLINK 1.9 using its method-of-moments.
snp_plinkIBDQC(
plink.path,
bedfile.in,
bedfile.out = NULL,
pi.hat = 0.08,
ncores = 1,
pruning.args = c(100, 0.2),
do.blind.QC = TRUE,
extra.options = "",
verbose = TRUE
)
Path to the executable of PLINK 1.9.
Path to the input bedfile.
Path to the output bedfile. Default is created by
appending "_norel"
to prefix.in
(bedfile.in
without extension).
PI_HAT value threshold for individuals (first by pairs)
to be excluded. Default is 0.08
.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
A vector of 2 pruning parameters, respectively
the window size (in variant count) and the pairwise $r^2$ threshold
(the step size is fixed to 1). Default is c(100, 0.2)
.
Whether to do QC with pi.hat
without visual inspection.
Default is TRUE
. If FALSE
, return the data.frame
of the corresponding
".genome" file without doing QC. One could use
ggplot2::qplot(Z0, Z1, data = mydf, col = RT)
for visual inspection.
Other options to be passed to PLINK as a string (for the IBD part). More options can be found at https://www.cog-genomics.org/plink/1.9/ibd.
Whether to show PLINK log? Default is TRUE
.
The path of the new bedfile. If no sample is filter, no new bed/bim/fam files are created and then the path of the input bedfile is returned.
Chang, Christopher C, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4 (1): 7. doi:10.1186/s13742-015-0047-8 .
if (FALSE) {
bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
plink <- download_plink()
bedfile <- snp_plinkIBDQC(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
ncores = 2)
df_rel <- snp_plinkIBDQC(plink, bedfile, do.blind.QC = FALSE, ncores = 2)
str(df_rel)
library(ggplot2)
qplot(Z0, Z1, data = df_rel, col = RT)
qplot(y = PI_HAT, data = df_rel) +
geom_hline(yintercept = 0.2, color = "blue", linetype = 2)
snp_plinkRmSamples(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
df.or.files = subset(df_rel, PI_HAT > 0.2))
}