Binomial(n, p) scaling where n is fixed and p is estimated.
snp_scaleAlpha(alpha = -1)
snp_scaleBinom(nploidy = 2)Assumes that the average contribution (e.g. heritability)
of a SNP of frequency \(p\) is proportional to
\([2p(1-p)]^{1+\alpha}\). The center is then \(2 p\) and the scale
is \([2p(1-p)]^{-\alpha/2}\). Default is -1.
Number of trials, parameter of the binomial distribution.
Default is 2, which corresponds to diploidy, such as for the human genome.
A new function that returns a data.frame of two vectors
"center" and "scale" which are of the length of ind.col.
You will probably not use this function as is but as the
fun.scaling parameter of other functions of package bigstatsr.
This scaling is widely used for SNP arrays. Patterson N, Price AL, Reich D (2006). Population Structure and Eigenanalysis. PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190 .
set.seed(1)
a <- matrix(0, 93, 170)
p <- 0.2
a[] <- rbinom(length(a), 2, p)
X <- add_code256(big_copy(a, type = "raw"), code = c(0, 1, 2, rep(NA, 253)))
X.svd <- big_SVD(X, fun.scaling = snp_scaleBinom())
str(X.svd)
#> List of 5
#> $ d : num [1:10] 22.2 21.6 21.5 21.2 20.8 ...
#> $ u : num [1:93, 1:10] 0.0732 -0.0378 -0.0762 0.0364 0.0444 ...
#> $ v : num [1:170, 1:10] 0.1075 -0.0331 0.0592 -0.0504 0.1216 ...
#> $ center: num [1:170] 0.419 0.387 0.301 0.43 0.419 ...
#> $ scale : num [1:170] 0.576 0.559 0.506 0.581 0.576 ...
#> - attr(*, "class")= chr "big_SVD"
plot(X.svd$center)
abline(h = 2 * p, col = "red")
plot(X.svd$scale)
abline(h = sqrt(2 * p * (1 - p)), col = "red")