Polygenic Risk Scores for a grid of clumping and thresholding parameters.
Stacking over many Polygenic Risk Scores, corresponding to a grid of many different parameters for clumping and thresholding.
snp_grid_clumping( G, infos.chr, infos.pos, lpS, ind.row = rows_along(G), grid.thr.r2 = c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95), grid.base.size = c(50, 100, 200, 500), infos.imp = rep(1, ncol(G)), grid.thr.imp = 1, groups = list(cols_along(G)), exclude = NULL, ncores = 1 ) snp_grid_PRS( G, all_keep, betas, lpS, n_thr_lpS = 50, grid.lpS.thr = 0.9999 * seq_log(max(0.1, min(lpS, na.rm = TRUE)), max(lpS, na.rm = TRUE), n_thr_lpS), ind.row = rows_along(G), backingfile = tempfile(), type = c("float", "double"), ncores = 1 ) snp_grid_stacking( multi_PRS, y.train, alphas = c(1, 0.01, 1e-04), ncores = 1, ... )
You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.
Vector of integers specifying each SNP's chromosome.
Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP.
Numeric vector of
-log10(p-value) associated with
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used.
Don't use negative indices.
Grid of thresholds over the squared correlation between
two SNPs for clumping. Default is
c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95).
Grid for base window sizes. Sizes are then computed as
base.size / thr.r2 (in kb). Default is
c(50, 100, 200, 500).
Vector of imputation scores. Default is all
1 if you do
not provide it.
Grid of thresholds over
infos.imp (default is
you should change it (e.g.
c(0.3, 0.6, 0.9, 0.95)) if providing
List of vectors of indices to define your own categories. This could be used e.g. to derive C+T scores using two different GWAS summary statistics, or to include other information such as functional annotations. Default just makes one group with all variants.
Vector of SNP indices to exclude anyway.
Number of cores used. Default doesn't use parallelism. You may use nb_cores.
snp_grid_clumping() (indices passing clumping).
Numeric vector of weights (effect sizes from GWAS) associated
with each variant (column of
G). If alleles are reversed, make sure to
multiply corresponding effects by
Length for default
grid.lpS.thr. Default is
Sequence of thresholds to apply on
Default is a grid (of length
n_thr_lpS) evenly spaced on a logarithmic
scale, i.e. on a log-log scale for p-values.
Prefix for backingfiles where to store scores of C+T. As we typically use a large grid, this can result in a large matrix so that we store it on disk. Default uses a temporary file.
Type of backingfile values. Either
"float" (the default) or
"float" requires half disk space.
Vector of phenotypes. If there are two levels (binary 0/1),
big_spLogReg() for stacking, otherwise
Vector of values for grid-search. See
Default for this function is
c(1, 0.01, 0.0001).
Other parameters to be passed to
big_spLogReg(). For example,
covar.train, you can add covariates in the model with all C+T scores.
You can also use
pf.covar if you do not want to penalize these covariates.
FBM (matrix on disk) that stores the C+T scores
for all parameters of the grid (and for each chromosome separately).
It also stores as attributes the input parameters
grid.lpS.thr that are also needed in