Fast truncated SVD with initial pruning and that iteratively removes long-range LD regions.

snp_autoSVD( G, infos.chr, infos.pos = NULL, ind.row = rows_along(G), ind.col = cols_along(G), fun.scaling = snp_scaleBinom(), thr.r2 = 0.2, size = 100/thr.r2, k = 10, roll.size = 50, int.min.size = 20, alpha.tukey = 0.05, min.mac = 10, max.iter = 5, is.size.in.bp = NULL, ncores = 1, verbose = TRUE ) bed_autoSVD( obj.bed, ind.row = rows_along(obj.bed), ind.col = cols_along(obj.bed), fun.scaling = bed_scaleBinom, thr.r2 = 0.2, size = 100/thr.r2, k = 10, roll.size = 50, int.min.size = 20, alpha.tukey = 0.05, min.mac = 10, max.iter = 5, ncores = 1, verbose = TRUE )

G | A FBM.code256
(typically |
---|---|

infos.chr | Vector of integers specifying each SNP's chromosome. |

infos.pos | Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP. |

ind.row | An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. |

ind.col | An optional vector of the column indices (SNPs) that are used.
If not specified, all columns are used. |

fun.scaling | A function that returns a named list of |

thr.r2 | Threshold over the squared correlation between two SNPs.
Default is |

size | For one SNP, window size around this SNP to compute correlations.
Default is |

k | Number of singular vectors/values to compute. Default is |

roll.size | Radius of rolling windows to smooth log-p-values.
Default is |

int.min.size | Minimum number of consecutive outlier SNPs
in order to be reported as long-range LD region. Default is |

alpha.tukey | Default is |

min.mac | Minimum minor allele count (MAC) for variants to be included.
Default is |

max.iter | Maximum number of iterations of outlier detection.
Default is |

is.size.in.bp | Deprecated. |

ncores | Number of cores used. Default doesn't use parallelism. You may use nb_cores. |

verbose | Output some information on the iterations? Default is |

obj.bed | Object of type |

A named list (an S3 class "big_SVD") of

`d`

, the singular values,`u`

, the left singular vectors,`v`

, the right singular vectors,`niter`

, the number of the iteration of the algorithm,`nops`

, number of Matrix-Vector multiplications used,`center`

, the centering vector,`scale`

, the scaling vector.

Note that to obtain the Principal Components, you must use predict on the result. See examples.

If you don't have any information about SNPs, you can try using

`infos.chr = rep(1, ncol(G))`

,`size = ncol(G)`

(if SNPs are not sorted),`roll.size = 0`

(if SNPs are not sorted).

ex <- snp_attachExtdata() obj.svd <- snp_autoSVD(G = ex$genotypes, infos.chr = ex$map$chromosome, infos.pos = ex$map$physical.position)#> #> Phase of clumping (on MAF) at r^2 > 0.2.. keep 4270 SNPs. #> Discarding 0 variant with MAC < 10. #> #> Iteration 1: #> Computing SVD.. #> 0 outlier variant detected.. #> #> Converged!#> List of 7 #> $ d : num [1:10] 235.4 148 105.5 96.4 94.9 ... #> $ u : num [1:517, 1:10] 0.0801 0.0798 0.0646 0.0781 0.0818 ... #> $ v : num [1:4270, 1:10] -0.00174 0.03142 -0.01527 0.0132 0.0154 ... #> $ niter : num 10 #> $ nops : num 170 #> $ center: num [1:4270] 0.412 0.474 0.369 0.913 0.712 ... #> $ scale : num [1:4270] 0.572 0.601 0.549 0.704 0.677 ... #> - attr(*, "class")= chr "big_SVD" #> - attr(*, "subset")= int [1:4270] 2 3 4 5 6 7 8 9 10 11 ... #> - attr(*, "lrldr")='data.frame': 0 obs. of 3 variables: #> ..$ Chr : int(0) #> ..$ Start: int(0) #> ..$ Stop : int(0)