Polygenic Risk Scores for a grid of clumping and thresholding parameters.

Stacking over many Polygenic Risk Scores, corresponding to a grid of many different parameters for clumping and thresholding.

```
snp_grid_clumping(
G,
infos.chr,
infos.pos,
lpS,
ind.row = rows_along(G),
grid.thr.r2 = c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95),
grid.base.size = c(50, 100, 200, 500),
infos.imp = rep(1, ncol(G)),
grid.thr.imp = 1,
groups = list(cols_along(G)),
exclude = NULL,
ncores = 1
)
snp_grid_PRS(
G,
all_keep,
betas,
lpS,
n_thr_lpS = 50,
grid.lpS.thr = 0.9999 * seq_log(max(0.1, min(lpS, na.rm = TRUE)), max(lpS, na.rm =
TRUE), n_thr_lpS),
ind.row = rows_along(G),
backingfile = tempfile(),
type = c("float", "double"),
ncores = 1
)
snp_grid_stacking(
multi_PRS,
y.train,
alphas = c(1, 0.01, 1e-04),
ncores = 1,
...
)
```

- G
A FBM.code256 (typically

`<bigSNP>$genotypes`

).**You shouldn't have missing values.**Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.- infos.chr
Vector of integers specifying each SNP's chromosome.

Typically`<bigSNP>$map$chromosome`

.- infos.pos
Vector of integers specifying the physical position on a chromosome (in base pairs) of each SNP.

Typically`<bigSNP>$map$physical.pos`

.- lpS
Numeric vector of

`-log10(p-value)`

associated with`betas`

.- ind.row
An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.

**Don't use negative indices.**- grid.thr.r2
Grid of thresholds over the squared correlation between two SNPs for clumping. Default is

`c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95)`

.- grid.base.size
Grid for base window sizes. Sizes are then computed as

`base.size / thr.r2`

(in kb). Default is`c(50, 100, 200, 500)`

.- infos.imp
Vector of imputation scores. Default is all

`1`

if you do not provide it.- grid.thr.imp
Grid of thresholds over

`infos.imp`

(default is`1`

), but you should change it (e.g.`c(0.3, 0.6, 0.9, 0.95)`

) if providing`infos.imp`

.- groups
List of vectors of indices to define your own categories. This could be used e.g. to derive C+T scores using two different GWAS summary statistics, or to include other information such as functional annotations. Default just makes one group with all variants.

- exclude
Vector of SNP indices to exclude anyway.

- ncores
Number of cores used. Default doesn't use parallelism. You may use nb_cores.

- all_keep
Output of

`snp_grid_clumping()`

(indices passing clumping).- betas
Numeric vector of weights (effect sizes from GWAS) associated with each variant (column of

`G`

). If alleles are reversed, make sure to multiply corresponding effects by`-1`

.- n_thr_lpS
Length for default

`grid.lpS.thr`

. Default is`50`

.- grid.lpS.thr
Sequence of thresholds to apply on

`lpS`

. Default is a grid (of length`n_thr_lpS`

) evenly spaced on a logarithmic scale, i.e. on a log-log scale for p-values.- backingfile
Prefix for backingfiles where to store scores of C+T. As we typically use a large grid, this can result in a large matrix so that we store it on disk. Default uses a temporary file.

- type
Type of backingfile values. Either

`"float"`

(the default) or`"double"`

. Using`"float"`

requires half disk space.- multi_PRS
Output of

`snp_grid_PRS()`

.- y.train
Vector of phenotypes. If there are two levels (binary 0/1), it uses

`big_spLogReg()`

for stacking, otherwise`big_spLinReg()`

.- alphas
Vector of values for grid-search. See

`big_spLogReg()`

. Default for this function is`c(1, 0.01, 0.0001)`

.- ...
Other parameters to be passed to

`big_spLogReg()`

. For example, using`covar.train`

, you can add covariates in the model with all C+T scores. You can also use`pf.covar`

if you do not want to penalize these covariates.

`snp_grid_PRS()`

: An `FBM`

(matrix on disk) that stores the C+T scores
for all parameters of the grid (and for each chromosome separately).
It also stores as attributes the input parameters `all_keep`

, `betas`

,
`lpS`

and `grid.lpS.thr`

that are also needed in `snp_grid_stacking()`

.