Some scaling functions for a Filebacked Big Matrix to be used as
the fun.scaling
parameter of some functions of this package.
big_scale(center = TRUE, scale = TRUE)
A logical value: whether to return means or 0s.
A logical value: whether to return standard deviations or 1s. You can't use scale without using center.
A new function that returns a data.frame of two vectors
"center" and "scale" which are of the length of ind.col
.
One could think about less common scalings, such as for example the
"y-aware" scaling which uses the inverse of betas of column-wise linear
regression as scaling. See this post for details.
It would be easy to implement it using big_colstats
to get column means
and big_univLinReg
to get betas (and then inverse them).
X <- big_attachExtdata()
# No scaling
big_noscale <- big_scale(center = FALSE, scale = FALSE)
class(big_noscale) # big_scale returns a new function
#> [1] "function"
str(big_noscale(X))
#> 'data.frame': 4542 obs. of 2 variables:
#> $ center: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ scale : num 1 1 1 1 1 1 1 1 1 1 ...
big_noscale2 <- big_scale(center = FALSE)
str(big_noscale2(X)) # you can't scale without centering
#> 'data.frame': 4542 obs. of 2 variables:
#> $ center: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ scale : num 1 1 1 1 1 1 1 1 1 1 ...
# Centering
big_center <- big_scale(scale = FALSE)
str(big_center(X))
#> 'data.frame': 4542 obs. of 2 variables:
#> $ center: num 1.32 1.59 1.53 1.63 1.09 ...
#> $ scale : num 1 1 1 1 1 1 1 1 1 1 ...
# + scaling
str(big_scale()(X))
#> 'data.frame': 4542 obs. of 2 variables:
#> $ center: num 1.32 1.59 1.53 1.63 1.09 ...
#> $ scale : num 0.679 0.569 0.627 0.558 0.719 ...