class: title-slide center middle inverse # Ultra fast penalized regressions<br>with
package {bigstatsr} <br> ## e-Rum 2020 <br> ### Florian Privé (@privefl) #### postdoc in *Predictive Human Genetics* --- ## {bigstatsr} uses memory-mapping <img src="memory-solution.svg" width="85%" style="display: block; margin: auto;" /> .footnote[`FBM` is very similar to `filebacked.big.matrix` from package {bigmemory}.] --- ## Penalized linear regression <br> with **lasso** ( `\(\alpha=1\)` ) or **elastic-net** regularization ( `\(0 < \alpha < 1\)` ) `$$L(\lambda, \alpha) = \underbrace{ ||y - X \beta||_2^2 }_\text{Loss function} + \underbrace{ \lambda \left( \alpha \|\beta\|_1 + (1-\alpha) \frac{\|\beta\|_2^2}{2} \right) }_\text{Penalisation}$$` <br> Two hyper-parameters in this model: - `\(\lambda\)` - `\(\alpha\)` --- ## Science and Implementation ### behind the penalized regression framework of {bigstatsr} <br> - Mostly implemented in **C++** - Use **strong rules** to discard variables a priori - Use **early-stopping** to avoid fitting costly models - Process the hyper-parameter **grid in parallel** <br>(memory-mapping makes it easy and efficient) .footnote[Strong rules: DOI: [10.1111%2Fj.1467-9868.2011.01004.x](https://doi.org/10.1111/j.1467-9868.2011.01004.x)] --- ## Predicting common diseases from genetics 15K `\(\times\)` 280K (30 GB) in **a few minutes** <img src="density-scores.svg" width="85%" style="display: block; margin: auto;" /> --- ## Predicting height from genetics 350K `\(\times\)` 560K (1.4 TB) in **one day** <img src="https://privefl.github.io/blog/images/UKB-final-pred.png" width="85%" style="display: block; margin: auto;" /> --- class: inverse, center, middle #
package {bigstatsr} # makes it possible # to fit penalized regressions # on 100s of GB of data --- ## Scientific publications <br> <a href="https://doi.org/10.1093/bioinformatics/bty185" target="_blank"> <img src="bty185.png" width="70%" style="display: block; margin: auto;" /> </a> <br> - {bigstatsr}: to be used by any field of research - {bigsnpr}: algorithms specific to my field of research, Human Genetics <br> <a href="https://doi.org/10.1534/genetics.119.302019" target="_blank"> <img src="paper2-2.PNG" width="70%" style="display: block; margin: auto;" /> </a> --- ## Contributions are welcome! <img src="cat-help.jpg" width="75%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Thanks! <br/><br/> #### Go check the package website and the vignette! <!-- Package's website: https://privefl.github.io/bigstatsr/ --> <br/>
[privefl](https://twitter.com/privefl)
[privefl](https://github.com/privefl)
[F. Privé](https://stackoverflow.com/users/6103040/f-priv%c3%a9) .footnote[Slides created using
package [**xaringan**](https://github.com/yihui/xaringan).]