Computing and projecting PCA of reference dataset to a target dataset.

bed_projectPCA(
  obj.bed.ref,
  obj.bed.new,
  k = 10,
  ind.row.new = rows_along(obj.bed.new),
  ind.row.ref = rows_along(obj.bed.ref),
  ind.col.ref = cols_along(obj.bed.ref),
  strand_flip = TRUE,
  join_by_pos = TRUE,
  match.min.prop = 0.5,
  build.new = "hg19",
  build.ref = "hg19",
  liftOver = NULL,
  ...,
  verbose = TRUE,
  ncores = 1
)

Arguments

obj.bed.ref

Object of type bed, which is the mapping of the bed file of the reference data. Use obj.bed <- bed(bedfile) to get this object.

obj.bed.new

Object of type bed, which is the mapping of the bed file of the target data. Use obj.bed <- bed(bedfile) to get this object.

k

Number of principal components to compute and project.

ind.row.new

Rows to be used in the target data. Default uses them all.

ind.row.ref

Rows to be used in the reference data. Default uses them all.

ind.col.ref

Columns to be potentially used in the reference data. Default uses all the ones in common with target data.

strand_flip

Whether to try to flip strand? (default is TRUE) If so, ambiguous alleles A/T and C/G are removed.

join_by_pos

Whether to join by chromosome and position (default), or instead by rsid.

match.min.prop

Minimum proportion of variants in the smallest data to be matched, otherwise stops with an error. Default is 50%.

build.new

Genome build of the target data. Default is hg19.

build.ref

Genome build of the reference data. Default is hg19.

liftOver

Path to liftOver executable. Binaries can be downloaded at https://bit.ly/2KvHugi for Mac and at https://bit.ly/2TbSaEI for Linux.

...

Arguments passed on to bed_autoSVD

fun.scaling

A function that returns a named list of mean and sd for every column, to scale each of their elements such as followed: $$\frac{X_{i,j} - mean_j}{sd_j}.$$ Default is snp_scaleBinom().

roll.size

Radius of rolling windows to smooth log-p-values. Default is 50.

int.min.size

Minimum number of consecutive outlier SNPs in order to be reported as long-range LD region. Default is 20.

thr.r2

Threshold over the squared correlation between two SNPs. Default is 0.2. Use NA if you want to skip the clumping step.

alpha.tukey

Default is 0.1. The type-I error rate in outlier detection (that is further corrected for multiple testing).

min.mac

Minimum minor allele count (MAC) for variants to be included. Default is 10.

max.iter

Maximum number of iterations of outlier detection. Default is 5.

size

For one SNP, window size around this SNP to compute correlations. Default is 100 / thr.r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing infos.pos (NULL, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available.

verbose

Output some information on the iterations? Default is TRUE.

ncores

Number of cores used. Default doesn't use parallelism. You may use nb_cores.

Value

A list of 3 elements:

  • $obj.svd.ref: big_SVD object computed from reference data.

  • $simple_proj: simple projection of new data into space of reference PCA.

  • $OADP_proj: Online Augmentation, Decomposition, and Procrustes (OADP) projection of new data into space of reference PCA.