Main problem
The main of problem that we will try to solve in this document is computing the problem between a very large matrix \(\boldsymbol{X}\) (of size \(n \times m\)) and a smaller one \(\boldsymbol{M}\). \(\boldsymbol{\widetilde{X}}\) is a scaled version of \(\boldsymbol{X}\), i.e. defined such that \(\widetilde{X}_{i,j} = \frac{X_{i,j}-\mu_j}{s_j}\). Usually, \(\boldsymbol{\mu}\) is the vector of column means (that you can get using the \(\texttt{R}\) command colMeans(X)
) and \(\boldsymbol{s}\) is the vector of column standard deviations (that you can get using the \(\texttt{R}\) command apply(X, 2, sd)
). For now, we will make no such assumptions on \(\boldsymbol{\mu}\) and \(\boldsymbol{s}\).
Rewriting the main problem
Let us denote \(\boldsymbol{A}= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}\) and \(\boldsymbol{B}= \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{bmatrix}\). Then \(\begin{bmatrix} a_{11} - \mu_1 & a_{12} - \mu_2 \\ a_{21} - \mu_1 & a_{22} - \mu_2 \\ a_{31} - \mu_1 & a_{32} - \mu_2 \end{bmatrix} = \boldsymbol{A}- \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} \mu_1 & \mu_2 \end{bmatrix} = \boldsymbol{A}- \boldsymbol{1}_3 \cdot \boldsymbol{\mu}^T\) and \(\begin{bmatrix} b_{11} d_1 & b_{12} d_2 \\ b_{21} d_1 & b_{22} d_2 \\ b_{31} d_1 & b_{32} d_2 \end{bmatrix} = \boldsymbol{B}\cdot \begin{bmatrix} d_1 & 0 \\ 0 & d_2 \end{bmatrix} = \boldsymbol{B}\cdot \boldsymbol{\Delta}\).
Let us generalize: \(\boldsymbol{\widetilde{X}} = (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s}\) where \(\boldsymbol{\Delta_s} = \begin{bmatrix} 1/s_1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1/s_{m} \end{bmatrix}\).
So, \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\). If we denote \(\boldsymbol{M_s} = \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{X}\cdot \boldsymbol{M_s} - \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T \cdot \boldsymbol{M_s}\), which doesn’t involve any explicit scaling of \(\boldsymbol{X}\).
Moreover, in the particular case where \(\boldsymbol{\mu}^T = \boldsymbol{1}_n^T \cdot \boldsymbol{X}\) is the vector of column means of \(\boldsymbol{X}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{C_n} \cdot (\boldsymbol{X}\cdot \boldsymbol{M_s})\) where \(\boldsymbol{C_n} = \boldsymbol{I_n} - \frac{1}{n} \boldsymbol{1}_n \cdot \boldsymbol{1}_n^T\) is the centering matrix.
Example
Let use the big_apply
function to implement the main problem we have treated in this document.
### Treating mu as any vector
# get mu and s as the column means and column standard deviations
stats_scale <- big_scale(center = TRUE, scale = TRUE)(X)
# compute M_s
M_s <- M / stats_scale$scale
# compute the product, without scaling X
XM_s <- big_prodMat(X, M_s)
# adjust the result
final <- sweep(XM_s, 2, crossprod(stats_scale$center, M_s), '-')
# verification
true <- scale(X[]) %*% M
all.equal(final, true)
## [1] TRUE
### Knowing that mu is the column means of X
final2 <- sweep(XM_s, 2, colMeans(XM_s), '-')
# verification
all.equal(final2, final)
## [1] TRUE