Online Coresets for Parametric and Non-Parametric Bregman Clustering
Source
Transactions on Machine Learning Research
Date Issued
2022-01-01
Author(s)
Volume
2022-July
Abstract
We present algorithms that create coresets in an online setting for clustering problems based on a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the gap between expected and empirical loss Bachem et al. (2017a), and take update time O(d) for every incoming point where d is the dimension of the point. Our first algorithm gives online coresets of size Õ(poly(k, d, ϵ, µ)) for k-clusterings according to any µ-similar Bregman divergence. We further extend this algorithm to show the existence of non-parametric coresets, where the coreset size is independent of k, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets also function as coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are significantly smaller for high dimensional data than the (relative-error) coresets obtained in Bachem et al. (2015) for DP-Means— for the input of size n our coresets grow as O(log n) while being independent of d as opposed to O(d<sup>d</sup>) for points in R<sup>d</sup> Bachem et al. (2015). We also present experiments to compare the performance of our algorithms with other sampling techniques.
