Compute a matrix product between BGEN files and a matrix. This removes the need to read an intermediate FBM object with snp_readBGEN() to compute the product. Moreover, when using dosages, they are not rounded to two decimal places anymore.

snp_prodBGEN(
  bgenfiles,
  beta,
  list_snp_id,
  ind_row = NULL,
  bgi_dir = dirname(bgenfiles),
  read_as = c("dosage", "random"),
  block_size = 1000,
  ncores = 1
)

Arguments

bgenfiles

Character vector of paths to files with extension ".bgen". The corresponding ".bgen.bgi" index files must exist.

beta

A matrix (or a vector), with rows corresponding to list_snp_id.

list_snp_id

List of character vectors of SNP IDs to read, with one vector per BGEN file. Each SNP ID should be in the form "<chr>_<pos>_<a1>_<a2>" (e.g. "1_88169_C_T" or "01_88169_C_T"). If you have one BGEN file only, just wrap your vector of IDs with list(). This function assumes that these IDs are uniquely identifying variants.

ind_row

An optional vector of the row indices (individuals) that are used. If not specified, all rows are used. Don't use negative indices. You can access the sample IDs corresponding to the genotypes from the .sample file, and use e.g. match() to get indices corresponding to the ones you want.

bgi_dir

Directory of index files. Default is the same as bgenfiles.

read_as

How to read BGEN probabilities? Currently implemented:

  • as dosages (rounded to two decimal places), the default,

  • as hard calls, randomly sampled based on those probabilities (similar to PLINK option '--hard-call-threshold random').

block_size

Maximum size of temporary blocks (in number of variants). Default is 1000.

ncores

Number of cores used. Default doesn't use parallelism. You may use bigstatsr::nb_cores().

Value

The product bgen_data[ind_row, 'list_snp_id'] %*% beta.

See also