Compute the geometric median, i.e. the point that minimizes the sum of all Euclidean distances to the observations (rows of U).

geometric_median(U, tol = 1e-10, maxiter = 1000, by_grp = NULL)

Arguments

U

A matrix (e.g. PC scores).

tol

Convergence criterion. Default is 1e-10.

maxiter

Maximum number of iterations. Default is 1000.

by_grp

Possibly a vector for splitting rows of U into groups before computing the geometric mean for each group. Default is NULL (ignored).

Value

The geometric median of all rows of U, a vector of the same size as ncol(U). If providing by_grp, then a matrix with rows being the geometric median within each group.

Examples

X <- readRDS(system.file("testdata", "three-pops.rds", package = "bigutilsr"))
pop <- rep(1:3, c(143, 167, 207))

svd <- svds(scale(X), k = 5)
U <- sweep(svd$u, 2, svd$d, '*')
plot(U, col = pop, pch = 20)

med_all <- geometric_median(U)
points(t(med_all), pch = 20, col = "blue", cex = 4)

med_pop <- geometric_median(U, by_grp = pop)
points(med_pop, pch = 20, col = "blue", cex = 2)