Compute \(X.row^T X.row\) for a Filebacked Big Matrix X
after applying a particular scaling to it.
big_crossprodSelf(
X,
fun.scaling = big_scale(center = FALSE, scale = FALSE),
ind.row = rows_along(X),
ind.col = cols_along(X),
block.size = block_size(nrow(X)),
backingfile = tempfile(tmpdir = getOption("FBM.dir"))
)
# S4 method for FBM,missing
crossprod(x, y)
An object of class FBM.
A function with parameters X
, ind.row
and ind.col
,
and that returns a data.frame with $center
and $scale
for the columns
corresponding to ind.col
, to scale each of their elements such as followed:
$$\frac{X_{i,j} - center_j}{scale_j}.$$ Default doesn't use any scaling.
You can also provide your own center
and scale
by using as_scaling_fun()
.
An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.
Maximum number of columns read at once. Default uses block_size.
Path to the file storing the FBM data on disk. An extension ".bk" will be automatically added. Default stores in the temporary directory, which you can change using global option "FBM.dir".
A 'double' FBM.
Missing.
A temporary FBM, with the following two attributes:
a numeric vector center
of column scaling,
a numeric vector scale
of column scaling.
Large matrix computations are made block-wise and won't be parallelized
in order to not have to reduce the size of these blocks. Instead, you can use
the MKL
or OpenBLAS in order to accelerate these block matrix computations.
You can control the number of cores used by these optimized matrix libraries
with bigparallelr::set_blas_ncores()
.
X <- FBM(13, 17, init = rnorm(221))
true <- crossprod(X[])
# No scaling
K1 <- crossprod(X)
class(K1)
#> [1] "matrix" "array"
all.equal(K1, true)
#> [1] TRUE
K2 <- big_crossprodSelf(X)
class(K2)
#> [1] "FBM"
#> attr(,"package")
#> [1] "bigstatsr"
K2$backingfile
#> [1] "C:\\Users\\au639593\\AppData\\Local\\Temp\\RtmpSeLJ92\\file577015ec246d.bk"
all.equal(K2[], true)
#> [1] TRUE
# big_crossprodSelf() provides some scaling and subsetting
# Example using only half of the data:
n <- nrow(X)
ind <- sort(sample(n, n/2))
K3 <- big_crossprodSelf(X, fun.scaling = big_scale(), ind.row = ind)
true2 <- crossprod(scale(X[ind, ]))
all.equal(K3[], true2)
#> [1] TRUE