snp_pruning(): LD pruning. Similar to "
--indep-pairwise (size+1) 1 thr.r2"
This function is deprecated (see
bed_clumping()): LD clumping. If you do not provide
any statistic to rank SNPs, it would use minor allele frequencies (MAFs),
making clumping similar to pruning.
snp_indLRLDR(): Get SNP indices of long-range LD regions for the
bed_clumping( obj.bed, ind.row = rows_along(obj.bed), S = NULL, thr.r2 = 0.2, size = 100/thr.r2, exclude = NULL, ncores = 1 ) snp_clumping( G, infos.chr, ind.row = rows_along(G), S = NULL, thr.r2 = 0.2, size = 100/thr.r2, infos.pos = NULL, is.size.in.bp = NULL, exclude = NULL, ncores = 1 ) snp_pruning( G, infos.chr, ind.row = rows_along(G), size = 49, is.size.in.bp = FALSE, infos.pos = NULL, thr.r2 = 0.2, exclude = NULL, nploidy = 2, ncores = 1 ) snp_indLRLDR(infos.chr, infos.pos, LD.regions = LD.wiki34)
Object of type bed, which is the mapping of some bed file.
obj.bed <- bed(bedfile) to get this object.
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used.
Don't use negative indices.
A vector of column statistics which express the importance
of each SNP (the more important is the SNP, the greater should be
the corresponding statistic).
For example, if
S follows the standard normal distribution, and "important"
means significantly different from 0, you must use
If not specified, MAFs are computed and used.
Threshold over the squared correlation between two SNPs.
For one SNP, window size around this SNP to compute correlations.
100 / thr.r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200).
If not providing
NULL, the default), this is a window in
number of SNPs, otherwise it is a window in kb (genetic distance).
I recommend that you provide the positions if available.
Vector of SNP indices to exclude anyway. For example,
can be used to exclude long-range LD regions (see Price2008). Another use
can be for thresholding with respect to p-values associated with
Number of cores used. Default doesn't use parallelism. You may use nb_cores.
You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.
Vector of integers specifying each SNP's chromosome.
Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP.
Number of trials, parameter of the binomial distribution.
2, which corresponds to diploidy, such as for the human genome.
data.frame with columns "Chr", "Start" and "Stop".
Default use the table of 34 long-range LD regions that you can find
bed_clumping()): SNP indices that are kept.
snp_indLRLDR(): SNP indices to be used as (part of) the '
Price AL, Weale ME, Patterson N, et al. Long-Range LD Can Confound Genome Scans in Admixed Populations. Am J Hum Genet. 2008;83(1):132-135. doi: 10.1016/j.ajhg.2008.06.005
test <- snp_attachExtdata() G <- test$genotypes # clumping (prioritizing higher MAF) ind.keep <- snp_clumping(G, infos.chr = test$map$chromosome, infos.pos = test$map$physical.pos, thr.r2 = 0.1) # keep most of them -> not much LD in this simulated dataset length(ind.keep) / ncol(G) #>  0.7919419