Standard univariate statistics

Standard univariate statistics for columns of a Filebacked Big Matrix. For now, the sum and var are implemented (the mean and sd can easily be deduced, see examples).

big_colstats(X, ind.row = rows_along(X), ind.col = cols_along(X), ncores = 1)

Arguments

X: An object of class FBM.
ind.row: An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.
ind.col: An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.
ncores: Number of cores used. Default doesn't use parallelism. You may use nb_cores.

Value

Data.frame of two numeric vectors sum and var with the corresponding column statistics.

Examples

set.seed(1)

X <- big_attachExtdata()

# Check the results
str(test <- big_colstats(X))
#> 'data.frame':	4542 obs. of  2 variables:
#>  $ sum: num  680 821 789 843 562 666 902 537 536 553 ...
#>  $ var: num  0.46 0.324 0.393 0.311 0.518 ...

# Only with the first 100 rows
ind <- 1:100
str(test2 <- big_colstats(X, ind.row = ind))
#> 'data.frame':	4542 obs. of  2 variables:
#>  $ sum: num  138 157 115 181 92 106 167 124 120 94 ...
#>  $ var: num  0.46 0.369 0.513 0.176 0.579 ...
plot(test$sum, test2$sum)
abline(lm(test2$sum ~ test$sum), col = "red", lwd = 2)


X.ind <- X[ind, ]
all.equal(test2$sum, colSums(X.ind))
#> [1] TRUE
all.equal(test2$var, apply(X.ind, 2, var))
#> [1] TRUE

# deduce mean and sd
# note that the are also implemented in big_scale()
means <- test2$sum / length(ind) # if using all rows,
                                 # divide by nrow(X) instead
all.equal(means, colMeans(X.ind))
#> [1] TRUE
sds <- sqrt(test2$var)
all.equal(sds, apply(X.ind, 2, sd))
#> [1] TRUE

Arguments

Value

See also

Examples