Simulate phenotypes using a linear model. When a prevalence is given, the liability threshold is used to convert liabilities to a binary outcome. The genetic and environmental liabilities are scaled such that the variance of the genetic liability is exactly equal to the requested heritability, and the variance of the total liability is equal to 1.

snp_simuPheno(
  G,
  h2,
  M,
  K = NULL,
  alpha = -1,
  ind.row = rows_along(G),
  ind.possible = cols_along(G),
  prob = NULL,
  effects.dist = c("gaussian", "laplace"),
  ncores = 1
)

Arguments

G

A FBM.code256 (typically <bigSNP>$genotypes).
You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.

h2

Heritability.

M

Number of causal variants.

K

Prevalence. Default is NULL, giving a continuous trait.

alpha

Assumes that the average contribution (e.g. heritability) of a SNP of frequency \(p\) is proportional to \([2p(1-p)]^{1+\alpha}\). Default is -1.

ind.row

An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.
Don't use negative indices.

ind.possible

Indices of possible causal variants.

prob

Vector of probability weights for sampling causal indices. It can have 0s (discarded) and is automatically scaled to sum to 1. Default is NULL (all indices have the same probability).

effects.dist

Distribution of effects. Either "gaussian" (the default) or "laplace".

ncores

Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().

Value

A list with 3 elements:

  • $pheno: vector of phenotypes,

  • $set: indices of causal variants,

  • $effects: effect sizes (of scaled genotypes) corresponding to set.

  • $allelic_effects: effect sizes, but on the allele scale (0|1|2).