Main problem

The main of problem that we will try to solve in this document is computing the problem between a very large matrix \(\boldsymbol{X}\) (of size \(n \times m\)) and a smaller one \(\boldsymbol{M}\). \(\boldsymbol{\widetilde{X}}\) is a scaled version of \(\boldsymbol{X}\), i.e. defined such that \(\widetilde{X}_{i,j} = \frac{X_{i,j}-\mu_j}{s_j}\). Usually, \(\boldsymbol{\mu}\) is the vector of column means (that you can get using the \(\texttt{R}\) command colMeans(X)) and \(\boldsymbol{s}\) is the vector of column standard deviations (that you can get using the \(\texttt{R}\) command apply(X, 2, sd)). For now, we will make no such assumptions on \(\boldsymbol{\mu}\) and \(\boldsymbol{s}\).

Rewriting the main problem

Let us denote \(\boldsymbol{A}= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}\) and \(\boldsymbol{B}= \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ b_{31} & b_{32} \end{bmatrix}\). Then \(\begin{bmatrix} a_{11} - \mu_1 & a_{12} - \mu_2 \\ a_{21} - \mu_1 & a_{22} - \mu_2 \\ a_{31} - \mu_1 & a_{32} - \mu_2 \end{bmatrix} = \boldsymbol{A}- \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} \mu_1 & \mu_2 \end{bmatrix} = \boldsymbol{A}- \boldsymbol{1}_3 \cdot \boldsymbol{\mu}^T\) and \(\begin{bmatrix} b_{11} d_1 & b_{12} d_2 \\ b_{21} d_1 & b_{22} d_2 \\ b_{31} d_1 & b_{32} d_2 \end{bmatrix} = \boldsymbol{B}\cdot \begin{bmatrix} d_1 & 0 \\ 0 & d_2 \end{bmatrix} = \boldsymbol{B}\cdot \boldsymbol{\Delta}\).

Let us generalize: \(\boldsymbol{\widetilde{X}} = (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s}\) where \(\boldsymbol{\Delta_s} = \begin{bmatrix} 1/s_1 & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1/s_{m} \end{bmatrix}\).

So, \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= (\boldsymbol{X}- \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T) \cdot \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\). If we denote \(\boldsymbol{M_s} = \boldsymbol{\Delta_s} \cdot \boldsymbol{M}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{X}\cdot \boldsymbol{M_s} - \boldsymbol{1}_n \cdot \boldsymbol{\mu}^T \cdot \boldsymbol{M_s}\), which doesn’t involve any explicit scaling of \(\boldsymbol{X}\).

Moreover, in the particular case where \(\boldsymbol{\mu}^T = \boldsymbol{1}_n^T \cdot \boldsymbol{X}\) is the vector of column means of \(\boldsymbol{X}\), then \(\boldsymbol{\widetilde{X}} \cdot \boldsymbol{M}= \boldsymbol{C_n} \cdot (\boldsymbol{X}\cdot \boldsymbol{M_s})\) where \(\boldsymbol{C_n} = \boldsymbol{I_n} - \frac{1}{n} \boldsymbol{1}_n \cdot \boldsymbol{1}_n^T\) is the centering matrix.

Conclusion

So, we have shown that we can compute a product between a scaled version of a large matrix and another matrix without having to explicitly scale this large matrix. This trick could be applied to other problems and you can find other tricks in the biglasso paper.

Example

Let use the big_apply function to implement the main problem we have treated in this document.

## [1] TRUE
## [1] TRUE