LOF: Identifying Density-Based Local Outliers.
LOF(
U,
seq_k = c(4, 10, 30),
combine = max,
robMaha = FALSE,
log = TRUE,
ncores = 1
)A matrix, from which to detect outliers (rows). E.g. PC scores.
Sequence of numbers of nearest neighbors to use.
If multiple k are provided, this returns the combination of statistics.
Default is c(4, 10, 30) and use max to combine (see combine).
How to combine results for multiple k? Default uses max.
Whether to use a robust Mahalanobis distance instead of the
normal euclidean distance? Default is FALSE, meaning using euclidean.
Whether to return the logarithm of LOFs? Default is TRUE.
Number of cores to use. Default is 1.
Breunig, Markus M., et al. "LOF: identifying density-based local outliers." ACM sigmod record. Vol. 29. No. 2. ACM, 2000.
X <- readRDS(system.file("testdata", "three-pops.rds", package = "bigutilsr"))
svd <- svds(scale(X), k = 10)
llof <- LOF(svd$u)
hist(llof, breaks = nclass.scottRob)
tukey_mc_up(llof)
#> [1] 0.8646255
llof_maha <- LOF(svd$u, robMaha = TRUE)
hist(llof_maha, breaks = nclass.scottRob)
tukey_mc_up(llof_maha)
#> [1] 0.7319994
lof <- LOF(svd$u, log = FALSE)
hist(lof, breaks = nclass.scottRob)
str(hist_out(lof))
#> List of 2
#> $ x : num [1:513] 1.04 1.15 1.08 1 1.01 ...
#> $ lim: num [1:2] -Inf 1.63
str(hist_out(lof, nboot = 100))
#> List of 3
#> $ x : num [1:513] 1.04 1.15 1.08 1 1.01 ...
#> $ lim : num [1:2] -Inf 1.63
#> $ all_lim: num [1:2, 1:100] -Inf 1.63 -Inf 1.63 -Inf ...
str(hist_out(lof, nboot = 100, breaks = "FD"))
#> List of 3
#> $ x : num [1:513] 1.04 1.15 1.08 1 1.01 ...
#> $ lim : num [1:2] -Inf 1.63
#> $ all_lim: num [1:2, 1:100] -Inf 1.63 -Inf 1.63 -Inf ...