Changelog

Check that parameters ind.row/ind.col are not NULL.

Remove {dplyr} dependency for internal function any_near0().

Fix conversion from NA_real to FBM type integer on new Macs.

Error when variables with a zero scaling are used in e.g. big_randomSVD() and big_crossprodSelf() (#52).

Add parameter backingfile to big_crossprodSelf() and big_cor() (#170).

Make sure not to use two levels of parallelism in big_univLogReg() (#137).

Check out-of-bounds ind.col in big_prodMat() (#154).

Add global option FBM.dir (that defaults to tempdir() as before). This can be used to change the default directory used to create FBMs when calling either FBM(), FBM.code256(), as_FBM(), big_copy(), or big_transpose(). Note that, if not using the temporary directory anymore, you must clean up the files you do not want to keep.

Enable ARMA_64BIT_WORD.

New strategy for $add_columns().

Add convenience function as_scaling_fun() to create your own fun.scaling parameters.

Now automatically discard covariates with no variation in pcor() (with a warning).

pcor() now returns NAs (instead of 0s) for singular systems.

Recode some parallel algorithms with OpenMP. For now, functions big_prodVec(), big_cprodVec(), big_colstats() and big_univLinReg() have been recoded.

Now detects and errors if there is not enough disk space to create an FBM.

Fix pcor() for singular systems, e.g. when x has all the same values.

Fix summary() and plot() for old (< v1.3) big_sp_list models.

Add function pcor() to compute partial correlations.

Add two options in big_spLinReg() and big_spLogReg(); power_scale for using a different scaling for LASSO and power_adaptive for using adaptive LASSO (where larger marginal effects are penalized less). See documentation for details.
big_(c)prodVec() and big_(c)prodMat() (re)gain a ncores parameter. Note that for big_(c)prodMat(), it might be beneficial to use the BLAS parallelism (with bigparallelr::set_blas_ncores()) instead of this parameter, especially when the matrix A is large-ish.

Function big_colstats() can now be run in parallel (added parameter ncores).

It is now possible to use C++ FBM accessors without linking to {RcppArmadillo}.

Functions big_(c)prodMat() and big_(t)crossprodSelf() now use much less memory, and may be faster.
Add covar_from_df() to convert a data frame with factors/characters to a numeric matrix using one-hot encoding.

Remove some ‘Suggests’ dependencies.

Add a new column $all_conv to output of summary() for big_spLinReg() and big_spLogReg() to check whether all models have stopped because of “no more improvement”. Also add a new parameter sort to summary().
Now warn (enabled by default) if some models may not have reached a minimum when using big_spLinReg() and big_spLogReg().

Fix In .self$nrow * .self$ncol : NAs produced by integer overflow.

Make two different memory-mappings: one that is read-only (using $address) and one where it is possible to write (using $address_rw). This enables to use file permissions to prevent modifying data.
Also add a new field $is_read_only to be used to prevent modifying data (at least with <-) even when you have write permissions to it. Functions creating an FBM now gain a parameter is_read_only.
Make vector accessors (e.g. X[1:10]) faster.

Move some code to new packages {bigassertr} and {bigparallelr}.
big_randomSVD() gains arguments related to matrix-vector multiplication.
assert_noNA() is faster.

Add big_increment().

In plot.big_SVD(),

Can now plot many PCA scores (more than two) at once.
Use coord_fixed() when plotting PCA scores because it is good practice.
Use log-scale in scree plot to better see small differences in singular values.
Reexport cowplot::plot_grid() to merge multiple ggplots.

AUCBoot() is now 6-7 times faster.

Add parameters center and scale to products.

Fix a bug in big_univLogReg() for variables with no variation. IRLS was not converging, so glm() was used instead. The problem is that glm() drops dimensions causing singularities so that Z-score of the first covariate (or intercept) was used instead of a missing value.

Use mio instead of boost for memory-mapping.
Add a parameter base.row to predict.big_sp_list() and automatically detect if needed (as well as for covar.row).
Possibility to subset a big_sp_list without losing attributes, so that one can access one model (corresponding to one alpha) even if it is not the ‘best’.
Add parameters pf.X and pf.covar in big_sp***Reg() to provide different penalization for each variable (possibly no penalization at all).

Add %*%, crossprod and tcrossprod operations for ‘double’ FBMs.

Now also returns the number of non-zero variables ($nb_active) and the number of candidate variables ($nb_candidate) for each step of the regularization paths of big_spLinReg() and big_spLogReg().

Parameters warn and return.all of big_spLinReg() and big_spLogReg() are deprecated; now always return the maximum information. Now provide two methods (summary and plot) to get a quick assessment of the fitted models.

Check of missing values for input vectors (indices and targets) and matrices (covariables).
AUC() is now stricter: it accepts only 0s and 1s for target.

$bm() and $bm.desc() have been added in order to get an FBM as a filebacked.big.matrix. This enables using {bigmemory} functions.

Type float added.

big_write added.

big_read now has a filter argument to filter rows, and argument nrow has been removed because it is now determined when reading the first block of data.
Removed the save argument from FBM (and others); now, you must use FBM(...)$save() instead of FBM(..., save = TRUE).

You can now fill an FBM using a data frame. Note that factors will be used as integers.
Package {bigreadr} has been developed and is now used by big_read.

There have been some changes regarding how conversion between types is checked. Before, you would get a warning for any possible loss of precision (without actually checking it). Now, any loss of precision due to conversion between types is reported as a warning, and only in this case. If you want to disable this feature, you can use options(bigstatsr.downcast.warning = FALSE), or you can use without_downcast_warning() to disable this warning for one call.

change big_read so that it is faster (corresponding vignette updated).

possibility to add a “base predictor” for big_spLinReg and big_spLogReg.
don’t store the whole regularization path (as a sparse matrix) in big_spLinReg and big_spLogReg anymore because it caused major slowdowns.
directly average the K predictions in predict.big_sp_best_list.
only use the “PSOCK” type of cluster because “FORK” can leave zombies behind. You can change this with options(bigstatsr.cluster.type = "PSOCK").

Fix a bug in big_spLinReg related to the computation of summaries.
Now provides function plus to be used as the combine argument in big_apply and big_parallelize instead of '+'.

Before, this package used only the “PSOCK” type of cluster, which has some significant overhead. Now, it uses the “FORK” type on non-Windows systems. You can change this with options(bigstatsr.cluster.type = "PSOCK"). Uses “PSOCK” in 0.4.0.

you can now provide multiple $\alpha$ values (as a numeric vector) in big_spLinReg and big_spLogReg. One will be chosen by grid-search.

fixed a bug in big_prodMat when using a dimension of 1 or 0.

Package {bigstatsr} is published in Bioinformatics

no scaling is used by default for big_crossprod, big_tcrossprod, big_SVD and big_randomSVD (before, there was no default at all)

Integrate Cross-Model Selection and Averaging (CMSA) directly in big_spLinReg and big_spLogReg, a procedure that automatically chooses the value of the $\lambda$ hyper-parameter.
Speed up big_spLinReg and big_spLogReg (issue #12)

Speed up AUC computations

No longer use the big.matrix format of package bigmemory

bigstatsr 1.6.2

bigstatsr 1.6.12024-09-09

bigstatsr 1.6.0

bigstatsr 1.5.14

bigstatsr 1.5.13

bigstatsr 1.5.11

bigstatsr 1.5.10

bigstatsr 1.5.9

bigstatsr 1.5.8

bigstatsr 1.5.7

bigstatsr 1.5.62022-02-03

bigstatsr 1.5.4

bigstatsr 1.5.3

bigstatsr 1.5.02021-03-29

bigstatsr 1.4.0

bigstatsr 1.3.3

bigstatsr 1.3.2

bigstatsr 1.3.12020-11-06

bigstatsr 1.3.0

bigstatsr 1.2.22020-03-09

bigstatsr 1.2.1

bigstatsr 1.2.0

bigstatsr 1.1.42020-02-01

bigstatsr 1.1.3

bigstatsr 1.1.1

bigstatsr 1.1.0

bigstatsr 1.0.0

bigstatsr 0.9.10

bigstatsr 0.9.9

bigstatsr 0.9.6

bigstatsr 0.9.5

bigstatsr 0.9.3

bigstatsr 0.9.0

bigstatsr 0.8.4

bigstatsr 0.8.3

bigstatsr 0.8.0

bigstatsr 0.7.3

bigstatsr 0.7.1

bigstatsr 0.7.0

bigstatsr 0.6.22018-08-17

bigstatsr 0.6.1

bigstatsr 0.6.0

bigstatsr 0.5.0

bigstatsr 0.4.1

bigstatsr 0.4.0

bigstatsr 0.3.4

bigstatsr 0.3.3

bigstatsr 0.3.2

bigstatsr 0.3.1

bigstatsr 0.3.0

bigstatsr 0.2.6

bigstatsr 0.2.42018-01-31

bigstatsr 0.2.32017-11-30

bigstatsr 0.2.0