For a bigSNP
:
snp_pruning()
: LD pruning. Similar to "--indep-pairwise (size+1) 1 thr.r2
"
in PLINK.
This function is deprecated (see
this article).
snp_clumping()
(and bed_clumping()
): LD clumping. If you do not provide
any statistic to rank SNPs, it would use minor allele frequencies (MAFs),
making clumping similar to pruning.
snp_indLRLDR()
: Get SNP indices of long-range LD regions for the
human genome.
bed_clumping( obj.bed, ind.row = rows_along(obj.bed), S = NULL, thr.r2 = 0.2, size = 100/thr.r2, exclude = NULL, ncores = 1 ) snp_clumping( G, infos.chr, ind.row = rows_along(G), S = NULL, thr.r2 = 0.2, size = 100/thr.r2, infos.pos = NULL, is.size.in.bp = NULL, exclude = NULL, ncores = 1 ) snp_pruning( G, infos.chr, ind.row = rows_along(G), size = 49, is.size.in.bp = FALSE, infos.pos = NULL, thr.r2 = 0.2, exclude = NULL, nploidy = 2, ncores = 1 ) snp_indLRLDR(infos.chr, infos.pos, LD.regions = LD.wiki34)
obj.bed | Object of type |
---|---|
ind.row | An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. |
S | A vector of column statistics which express the importance
of each SNP (the more important is the SNP, the greater should be
the corresponding statistic). |
thr.r2 | Threshold over the squared correlation between two SNPs.
Default is |
size | For one SNP, window size around this SNP to compute correlations.
Default is |
exclude | Vector of SNP indices to exclude anyway. For example,
can be used to exclude long-range LD regions (see Price2008). Another use
can be for thresholding with respect to p-values associated with |
ncores | Number of cores used. Default doesn't use parallelism. You may use nb_cores. |
G | A FBM.code256
(typically |
infos.chr | Vector of integers specifying each SNP's chromosome. |
infos.pos | Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP. |
is.size.in.bp | Deprecated. |
nploidy | Number of trials, parameter of the binomial distribution.
Default is |
LD.regions | A |
snp_clumping()
(and bed_clumping()
): SNP indices that are kept.
snp_indLRLDR()
: SNP indices to be used as (part of) the 'exclude
'
parameter of snp_clumping()
.
Price AL, Weale ME, Patterson N, et al. Long-Range LD Can Confound Genome Scans in Admixed Populations. Am J Hum Genet. 2008;83(1):132-135. doi: 10.1016/j.ajhg.2008.06.005
test <- snp_attachExtdata() G <- test$genotypes # clumping (prioritizing higher MAF) ind.keep <- snp_clumping(G, infos.chr = test$map$chromosome, infos.pos = test$map$physical.pos, thr.r2 = 0.1) # keep most of them -> not much LD in this simulated dataset length(ind.keep) / ncol(G)#> [1] 0.7919419