LDpred2. Tutorial at https://privefl.github.io/bigsnpr/articles/LDpred2.html.
snp_ldpred2_inf(corr, df_beta, h2)
snp_ldpred2_grid(
corr,
df_beta,
grid_param,
burn_in = 50,
num_iter = 100,
ncores = 1,
return_sampling_betas = FALSE,
ind.corr = cols_along(corr)
)
snp_ldpred2_auto(
corr,
df_beta,
h2_init,
vec_p_init = 0.1,
burn_in = 500,
num_iter = 200,
sparse = FALSE,
verbose = FALSE,
report_step = num_iter + 1L,
allow_jump_sign = TRUE,
shrink_corr = 1,
use_MLE = TRUE,
p_bounds = c(1e-05, 1),
alpha_bounds = c(-1.5, 0.5),
ind.corr = cols_along(corr),
ncores = 1
)
Sparse correlation matrix as an SFBM.
If corr
is a dsCMatrix or a dgCMatrix, you can use as_SFBM(corr)
.
A data frame with 3 columns:
$beta
: effect size estimates
$beta_se
: standard errors of effect size estimates
$n_eff
: either GWAS sample size(s) when estimating beta
for a
continuous trait, or in the case of a binary trait, this is
4 / (1 / n_control + 1 / n_case)
; in the case of a meta-analysis, you
should sum the effective sample sizes of each study instead of using the
total numbers of cases and controls, see doi:10.1016/j.biopsych.2022.05.029
;
when using a mixed model, the effective sample size needs to be adjusted
as well, see doi:10.1016/j.xhgg.2022.100136
.
Heritability estimate.
A data frame with 3 columns as a grid of hyper-parameters:
$p
: proportion of causal variants
$h2
: heritability (captured by the variants used)
$sparse
: boolean, whether a sparse model is sought
They can be run in parallel by changing ncores
.
Number of burn-in iterations.
Number of iterations after burn-in.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
Whether to return all sampling betas (after
burn-in)? This is useful for assessing the uncertainty of the PRS at the
individual level (see doi:10.1101/2020.11.30.403188
).
Default is FALSE
(only returns the averaged final vectors of betas).
If TRUE
, only one set of parameters is allowed.
Indices to "subset" corr
, as if this was run with
corr[ind.corr, ind.corr]
instead. No subsetting by default.
Heritability estimate for initialization.
Vector of initial values for p. Default is 0.1
.
In LDpred2-auto, whether to also report a sparse solution by
running LDpred2-grid with the estimates of p and h2 from LDpred2-auto, and
sparsity enabled. Default is FALSE
.
Whether to print "p // h2" estimates at each iteration. Disabled when parallelism is used.
Step to report sampling betas (after burn-in and before
unscaling). Nothing is reported by default. If using num_iter = 200
and
report_step = 20
, then 10 vectors of sampling betas are reported
(as a sparse matrix with 10 columns).
Whether to allow for effects sizes to change sign in
consecutive iterations? Default is TRUE
(normal sampling). You can use
FALSE
to force effects to go through 0 first before changing sign. Setting
this parameter to FALSE
could be useful to prevent instability (oscillation
and ultimately divergence) of the Gibbs sampler. This would also be useful
for accelerating convergence of chains with a large initial value for p.
Shrinkage multiplicative coefficient to apply to off-diagonal
elements of the correlation matrix. Default is 1
(unchanged).
You can use e.g. 0.95
to add a bit of regularization.
Whether to use maximum likelihood estimation (MLE) to estimate
alpha and the variance component (since v1.11.4), or assume that alpha is
-1 and estimate the variance of (scaled) effects as h2/(m*p), as it was
done in earlier versions of LDpred2-auto (e.g. in v1.10.8). Default is TRUE
,
which should provide a better model fit, but might also be less robust.
Boundaries for the estimates of p (the polygenicity).
Default is c(1e-5, 1)
. You can use the same value twice to fix p.
Boundaries for the estimates of \(\alpha\).
Default is c(-1.5, 0.5)
. You can use the same value twice to fix \(\alpha\).
snp_ldpred2_inf
: A vector of effects, assuming an infinitesimal model.
snp_ldpred2_grid
: A matrix of effect sizes, one vector (column)
for each row of grid_param
. Missing values are returned when strong
divergence is detected. If using return_sampling_betas
, each column
corresponds to one iteration instead (after burn-in).
snp_ldpred2_auto
: A list (over vec_p_init
) of lists with
$beta_est
: vector of effect sizes (on the allele scale); note that
missing values are returned when strong divergence is detected
$beta_est_sparse
(only when sparse = TRUE
): sparse vector of effect sizes
$postp_est
: vector of posterior probabilities of being causal
$corr_est
, the "imputed" correlations between variants and phenotypes,
which can be used for post-QCing variants by comparing those to
with(df_beta, beta / sqrt(n_eff * beta_se^2 + beta^2))
$sample_beta
: sparse matrix of sampling betas (see parameter report_step
),
not on the allele scale, for which you need to multiply by
with(df_beta, sqrt(n_eff * beta_se^2 + beta^2))
$path_p_est
: full path of p estimates (including burn-in);
useful to check convergence of the iterative algorithm
$path_h2_est
: full path of h2 estimates (including burn-in);
useful to check convergence of the iterative algorithm
$path_alpha_est
: full path of alpha estimates (including burn-in);
useful to check convergence of the iterative algorithm
$h2_est
: estimate of the (SNP) heritability (also see coef_to_liab)
$p_est
: estimate of p, the proportion of causal variants
$alpha_est
: estimate of alpha, the parameter controlling the
relationship between allele frequencies and expected effect sizes
$h2_init
and $p_init
: input parameters, for convenience
For reproducibility, set.seed()
can be used to ensure that two runs of
LDpred2 give the exact same results (since v1.10).