Polygenic Risk Scores for a grid of clumping and thresholding parameters.

Stacking over many Polygenic Risk Scores, corresponding to a grid of many different parameters for clumping and thresholding.

snp_grid_clumping(
G,
infos.chr,
infos.pos,
lpS,
ind.row = rows_along(G),
grid.thr.r2 = c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95),
grid.base.size = c(50, 100, 200, 500),
infos.imp = rep(1, ncol(G)),
grid.thr.imp = 1,
groups = list(cols_along(G)),
exclude = NULL,
ncores = 1
)

snp_grid_PRS(
G,
all_keep,
betas,
lpS,
n_thr_lpS = 50,
grid.lpS.thr = 0.9999 * seq_log(max(0.1, min(lpS, na.rm = TRUE)), max(lpS, na.rm =
TRUE), n_thr_lpS),
ind.row = rows_along(G),
backingfile = tempfile(),
type = c("float", "double"),
ncores = 1
)

snp_grid_stacking(
multi_PRS,
y.train,
alphas = c(1, 0.01, 1e-04),
ncores = 1,
...
)

## Arguments

G A FBM.code256 (typically $genotypes). You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF. Vector of integers specifying each SNP's chromosome. Typically $map$chromosome. Vector of integers specifying the physical position on a chromosome (in base pairs) of each SNP. Typically $map\$physical.pos. Numeric vector of -log10(p-value) associated with betas. An optional vector of the row indices (individuals) that are used. If not specified, all rows are used. Don't use negative indices. Grid of thresholds over the squared correlation between two SNPs for clumping. Default is c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95). Grid for base window sizes. Sizes are then computed as base.size / thr.r2 (in kb). Default is c(50, 100, 200, 500). Vector of imputation scores. Default is all 1 if you do not provide it. Grid of thresholds over infos.imp (default is 1), but you should change it (e.g. c(0.3, 0.6, 0.9, 0.95)) if providing infos.imp. List of vectors of indices to define your own categories. This could be used e.g. to derive C+T scores using two different GWAS summary statistics, or to include other information such as functional annotations. Default just makes one group with all variants. Vector of SNP indices to exclude anyway. Number of cores used. Default doesn't use parallelism. You may use nb_cores. Output of snp_grid_clumping() (indices passing clumping). Numeric vector of weights (effect sizes from GWAS) associated with each variant (column of G). If alleles are reversed, make sure to multiply corresponding effects by -1. Length for default grid.lpS.thr. Default is 50. Sequence of thresholds to apply on lpS. Default is a grid (of length n_thr_lpS) evenly spaced on a logarithmic scale, i.e. on a log-log scale for p-values. Prefix for backingfiles where to store scores of C+T. As we typically use a large grid, this can result in a large matrix so that we store it on disk. Default uses a temporary file. Type of backingfile values. Either "float" (the default) or "double". Using "float" requires half disk space. Output of snp_grid_PRS(). Vector of phenotypes. If there are two levels (binary 0/1), it uses big_spLogReg() for stacking, otherwise big_spLinReg(). Vector of values for grid-search. See big_spLogReg(). Default for this function is c(1, 0.01, 0.0001). Other parameters to be passed to big_spLogReg(). For example, using covar.train, you can add covariates in the model with all C+T scores. You can also use pf.covar if you do not want to penalize these covariates.

## Value

snp_grid_PRS(): An FBM (matrix on disk) that stores the C+T scores for all parameters of the grid (and for each chromosome separately). It also stores as attributes the input parameters all_keep, betas, lpS and grid.lpS.thr that are also needed in snp_grid_stacking().