vignettes/operations-with-scaling.Rmd
operations-with-scaling.Rmd
The main of problem that we will try to solve in this document is
computing the problem between a very large matrix \(\boldsymbol{X}\) (of size \(n \times m\)) and a smaller one \(\boldsymbol{M}\). \(\boldsymbol{\widetilde{X}}\) is a scaled
version of \(\boldsymbol{X}\),
i.e. defined such that \(\widetilde{X}_{i,j} =
\frac{X_{i,j}-\mu_j}{s_j}\). Usually, \(\boldsymbol{\mu}\) is the vector of column
means (that you can get using the \(\texttt{R}\) command
colMeans(X)
) and \(\boldsymbol{s}\) is the vector of column
standard deviations (that you can get using the \(\texttt{R}\) command
apply(X, 2, sd)
). For now, we will make no such assumptions
on \(\boldsymbol{\mu}\) and \(\boldsymbol{s}\).
Let us denote \(\boldsymbol{A}= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}\) and \(\boldsymbol{B}= \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{bmatrix}\). Then \(\begin{bmatrix} a_{11} - \mu_1 & a_{12} - \mu_2 \\ a_{21} - \mu_1 & a_{22} - \mu_2 \\ a_{31} - \mu_1 & a_{32} - \mu_2 \end{bmatrix} = \boldsymbol{A}- \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} \mu_1 & \mu_2 \end{bmatrix} = \boldsymbol{A}- \boldsymbol{1}_3 \cdot \boldsymbol{\mu}^T\) and \(\begin{bmatrix} b_{11} d_1 & b_{12} d_2 \\ b_{21} d_1 & b_{22} d_2 \\ b_{31} d_1 & b_{32} d_2 \end{bmatrix} = \boldsymbol{B}\cdot \begin{bmatrix} d_1 & 0 \\ 0 & d_2 \end{bmatrix} = \boldsymbol{B}\cdot \boldsymbol{\Delta}\).
Let us generalize: \(\boldsymbol{\widetilde{X}} = (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s}\) where \(\boldsymbol{\Delta_s} = \begin{bmatrix} 1/s_1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1/s_{m} \end{bmatrix}\).
So, \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\). If we denote \(\boldsymbol{M_s} = \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{X}\cdot \boldsymbol{M_s} - \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T \cdot \boldsymbol{M_s}\), which doesn’t involve any explicit scaling of \(\boldsymbol{X}\).
Moreover, in the particular case where \(\boldsymbol{\mu}^T = \boldsymbol{1}_n^T \cdot \boldsymbol{X}\) is the vector of column means of \(\boldsymbol{X}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{C_n} \cdot (\boldsymbol{X}\cdot \boldsymbol{M_s})\) where \(\boldsymbol{C_n} = \boldsymbol{I_n} - \frac{1}{n} \boldsymbol{1}_n \cdot \boldsymbol{1}_n^T\) is the centering matrix.
So, we have shown that we can compute a product between a scaled version of a large matrix and another matrix without having to explicitly scale this large matrix. This trick could be applied to other problems and you can find other tricks in the biglasso paper.
Let use the big_apply
function to implement the main
problem we have treated in this document.
X <- big_attachExtdata()
# simulate some data
M <- matrix(0, ncol(X), 10)
M[] <- rnorm(length(M))
### Treating mu as any vector
# get mu and s as the column means and column standard deviations
stats_scale <- big_scale(center = TRUE, scale = TRUE)(X)
# compute M_s
M_s <- M / stats_scale$scale
# compute the product, without scaling X
XM_s <- big_prodMat(X, M_s)
# adjust the result
final <- sweep(XM_s, 2, crossprod(stats_scale$center, M_s), '-')
# verification
true <- scale(X[]) %*% M
all.equal(final, true)
## [1] TRUE
### Knowing that mu is the column means of X
final2 <- sweep(XM_s, 2, colMeans(XM_s), '-')
# verification
all.equal(final2, final)
## [1] TRUE