Fast truncated SVD with initial pruning and that iteratively removes
long-range LD regions. Some variants are removing due to the initial clumping,
then more and more variants are removed at each iteration. You can access the
indices of the remaining variants with `attr(*, "subset")`

. If some of the
variants removed are contiguous, the regions are reported in `attr(*, "lrldr")`

.

```
snp_autoSVD(
G,
infos.chr,
infos.pos = NULL,
ind.row = rows_along(G),
ind.col = cols_along(G),
fun.scaling = snp_scaleBinom(),
thr.r2 = 0.2,
size = 100/thr.r2,
k = 10,
roll.size = 50,
int.min.size = 20,
alpha.tukey = 0.05,
min.mac = 10,
min.maf = 0.02,
max.iter = 5,
is.size.in.bp = NULL,
ncores = 1,
verbose = TRUE
)
bed_autoSVD(
obj.bed,
ind.row = rows_along(obj.bed),
ind.col = cols_along(obj.bed),
fun.scaling = bed_scaleBinom,
thr.r2 = 0.2,
size = 100/thr.r2,
k = 10,
roll.size = 50,
int.min.size = 20,
alpha.tukey = 0.05,
min.mac = 10,
min.maf = 0.02,
max.iter = 5,
ncores = 1,
verbose = TRUE
)
```

- G
A FBM.code256 (typically

`<bigSNP>$genotypes`

).**You shouldn't have missing values.**Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.- infos.chr
Vector of integers specifying each SNP's chromosome.

Typically`<bigSNP>$map$chromosome`

.- infos.pos
Vector of integers specifying the physical position on a chromosome (in base pairs) of each SNP.

Typically`<bigSNP>$map$physical.pos`

.- ind.row
An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.

**Don't use negative indices.**- ind.col
An optional vector of the column indices (SNPs) that are used. If not specified, all columns are used.

**Don't use negative indices.**- fun.scaling
A function with parameters

`X`

(or`obj.bed`

),`ind.row`

and`ind.col`

, and that returns a data.frame with`$center`

and`$scale`

for the columns corresponding to`ind.col`

, to scale each of their elements such as followed: $$\frac{X_{i,j} - center_j}{scale_j}.$$ Default uses binomial scaling. You can also provide your own`center`

and`scale`

by using`bigstatsr::as_scaling_fun()`

.- thr.r2
Threshold over the squared correlation between two variants. Default is

`0.2`

. Use`NA`

if you want to skip the clumping step.- size
For one SNP, window size around this SNP to compute correlations. Default is

`100 / thr.r2`

for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing`infos.pos`

(`NULL`

, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available.- k
Number of singular vectors/values to compute. Default is

`10`

.**This algorithm should be used to compute a few singular vectors/values.**- roll.size
Radius of rolling windows to smooth log-p-values. Default is

`50`

.- int.min.size
Minimum number of consecutive outlier variants in order to be reported as long-range LD region. Default is

`20`

.- alpha.tukey
Default is

`0.1`

. The type-I error rate in outlier detection (that is further corrected for multiple testing).- min.mac
Minimum minor allele count (MAC) for variants to be included. Default is

`10`

. Can actually be higher because of`min.maf`

.- min.maf
Minimum minor allele frequency (MAF) for variants to be included. Default is

`0.02`

. Can actually be higher because of`min.mac`

.- max.iter
Maximum number of iterations of outlier detection. Default is

`5`

.- is.size.in.bp
Deprecated.

- ncores
Number of cores used. Default doesn't use parallelism. You may use

`bigstatsr::nb_cores()`

.- verbose
Output some information on the iterations? Default is

`TRUE`

.- obj.bed
Object of type bed, which is the mapping of some bed file. Use

`obj.bed <- bed(bedfile)`

to get this object.

A named list (an S3 class "big_SVD") of

`d`

, the singular values,`u`

, the left singular vectors,`v`

, the right singular vectors,`niter`

, the number of the iteration of the algorithm,`nops`

, number of Matrix-Vector multiplications used,`center`

, the centering vector,`scale`

, the scaling vector.

Note that to obtain the Principal Components, you must use predict on the result. See examples.

If you don't have any information about variants, you can try using

`infos.chr = rep(1, ncol(G))`

,`size = ncol(G)`

(if variants are not sorted),`roll.size = 0`

(if variants are not sorted).

```
ex <- snp_attachExtdata()
G <- ex$genotypes
obj.svd <- snp_autoSVD(G,
infos.chr = ex$map$chromosome,
infos.pos = ex$map$physical.position)
#> Discarding 0 variant with MAC < 10 or MAF < 0.02.
#>
#> Phase of clumping (on MAF) at r^2 > 0.2.. keep 4270 variants.
#>
#> Iteration 1:
#> Computing SVD..
#> 0 outlier variant detected..
#>
#> Converged!
str(obj.svd)
#> List of 7
#> $ d : num [1:10] 235.4 148 105.5 96.4 94.9 ...
#> $ u : num [1:517, 1:10] 0.0801 0.0798 0.0646 0.0781 0.0818 ...
#> $ v : num [1:4270, 1:10] -0.00174 0.03142 -0.01527 0.0132 0.0154 ...
#> $ niter : num 10
#> $ nops : num 170
#> $ center: num [1:4270] 0.412 0.474 0.369 0.913 0.712 ...
#> $ scale : num [1:4270] 0.572 0.601 0.549 0.704 0.677 ...
#> - attr(*, "class")= chr "big_SVD"
#> - attr(*, "subset")= int [1:4270] 2 3 4 5 6 7 8 9 10 11 ...
#> - attr(*, "lrldr")='data.frame': 0 obs. of 4 variables:
#> ..$ Chr : int(0)
#> ..$ Start: int(0)
#> ..$ Stop : int(0)
#> ..$ Iter : int(0)
```