LDpred2 — snp_ldpred2_inf • bigsnpr

LDpred2. Tutorial at https://privefl.github.io/bigsnpr/articles/LDpred2.html.

snp_ldpred2_inf(corr, df_beta, h2)

snp_ldpred2_grid(
  corr,
  df_beta,
  grid_param,
  burn_in = 50,
  num_iter = 100,
  ncores = 1,
  return_sampling_betas = FALSE,
  ind.corr = cols_along(corr)
)

snp_ldpred2_auto(
  corr,
  df_beta,
  h2_init,
  vec_p_init = 0.1,
  burn_in = 500,
  num_iter = 200,
  sparse = FALSE,
  verbose = FALSE,
  report_step = num_iter + 1L,
  allow_jump_sign = TRUE,
  shrink_corr = 1,
  use_MLE = TRUE,
  p_bounds = c(1e-05, 1),
  alpha_bounds = c(-1.5, 0.5),
  ind.corr = cols_along(corr),
  ncores = 1
)

Arguments

corr

Sparse correlation matrix as an SFBM. If corr is a dsCMatrix or a dgCMatrix, you can use as_SFBM(corr).

df_beta

A data frame with 3 columns:

$beta: effect size estimates
$beta_se: standard errors of effect size estimates
$n_eff: either GWAS sample size(s) when estimating beta for a continuous trait, or in the case of a binary trait, this is 4 / (1 / n_control + 1 / n_case); in the case of a meta-analysis, you should sum the effective sample sizes of each study instead of using the total numbers of cases and controls, see doi:10.1016/j.biopsych.2022.05.029 ; when using a mixed model, the effective sample size needs to be adjusted as well, see doi:10.1016/j.xhgg.2022.100136 .

h2

Heritability estimate.

grid_param

A data frame with 3 columns as a grid of hyper-parameters:

$p: proportion of causal variants
$h2: heritability (captured by the variants used)
$sparse: boolean, whether a sparse model is sought They can be run in parallel by changing ncores.

burn_in

Number of burn-in iterations.

num_iter

Number of iterations after burn-in.

ncores

Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().

return_sampling_betas

Whether to return all sampling betas (after burn-in)? This is useful for assessing the uncertainty of the PRS at the individual level (see doi:10.1101/2020.11.30.403188 ). Default is FALSE (only returns the averaged final vectors of betas). If TRUE, only one set of parameters is allowed.

ind.corr

Indices to "subset" corr, as if this was run with corr[ind.corr, ind.corr] instead. No subsetting by default.

h2_init

Heritability estimate for initialization.

vec_p_init

Vector of initial values for p. Default is 0.1.

sparse

In LDpred2-auto, whether to also report a sparse solution by running LDpred2-grid with the estimates of p and h2 from LDpred2-auto, and sparsity enabled. Default is FALSE.

verbose

Whether to print "p // h2" estimates at each iteration. Disabled when parallelism is used.

report_step

Step to report sampling betas (after burn-in and before unscaling). Nothing is reported by default. If using num_iter = 200 and report_step = 20, then 10 vectors of sampling betas are reported (as a sparse matrix with 10 columns).

allow_jump_sign

Whether to allow for effects sizes to change sign in consecutive iterations? Default is TRUE (normal sampling). You can use FALSE to force effects to go through 0 first before changing sign. Setting this parameter to FALSE could be useful to prevent instability (oscillation and ultimately divergence) of the Gibbs sampler. This would also be useful for accelerating convergence of chains with a large initial value for p.

shrink_corr

Shrinkage multiplicative coefficient to apply to off-diagonal elements of the correlation matrix. Default is 1 (unchanged). You can use e.g. 0.95 to add a bit of regularization.

use_MLE

Whether to use maximum likelihood estimation (MLE) to estimate alpha and the variance component (since v1.11.4), or assume that alpha is -1 and estimate the variance of (scaled) effects as h2/(m*p), as it was done in earlier versions of LDpred2-auto (e.g. in v1.10.8). Default is TRUE, which should provide a better model fit, but might also be less robust.

p_bounds

Boundaries for the estimates of p (the polygenicity). Default is c(1e-5, 1). You can use the same value twice to fix p.

alpha_bounds

Boundaries for the estimates of $\alpha$. Default is c(-1.5, 0.5). You can use the same value twice to fix $\alpha$.

Value

snp_ldpred2_inf: A vector of effects, assuming an infinitesimal model.

snp_ldpred2_grid: A matrix of effect sizes, one vector (column) for each row of grid_param. Missing values are returned when strong divergence is detected. If using return_sampling_betas, each column corresponds to one iteration instead (after burn-in).

snp_ldpred2_auto: A list (over vec_p_init) of lists with

$beta_est: vector of effect sizes (on the allele scale); note that missing values are returned when strong divergence is detected
$beta_est_sparse (only when sparse = TRUE): sparse vector of effect sizes
$postp_est: vector of posterior probabilities of being causal
$corr_est, the "imputed" correlations between variants and phenotypes, which can be used for post-QCing variants by comparing those to with(df_beta, beta / sqrt(n_eff * beta_se^2 + beta^2))
$sample_beta: sparse matrix of sampling betas (see parameter report_step), not on the allele scale, for which you need to multiply by with(df_beta, sqrt(n_eff * beta_se^2 + beta^2))
$path_p_est: full path of p estimates (including burn-in); useful to check convergence of the iterative algorithm
$path_h2_est: full path of h2 estimates (including burn-in); useful to check convergence of the iterative algorithm
$path_alpha_est: full path of alpha estimates (including burn-in); useful to check convergence of the iterative algorithm
$h2_est: estimate of the (SNP) heritability (also see coef_to_liab)
$p_est: estimate of p, the proportion of causal variants
$alpha_est: estimate of alpha, the parameter controlling the relationship between allele frequencies and expected effect sizes
$h2_init and $p_init: input parameters, for convenience

Details

For reproducibility, set.seed() can be used to ensure that two runs of LDpred2 give the exact same results (since v1.10).