Compressed empirical measures

Staff - Faculty of Informatics

Date: 23 March 2022 / 16:30 - 17:30

USI Lugano Campus EST, room C1.05, Via la Santa 1, 6962 Viganello

Steffen Grünewälder, Lancaster University, UK

I present an approach for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces. In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives under certain conditions rise to a coreset of data points. A key quantity that controls how large such a coreset must be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. I will provide high probability lower bounds on the size of such a ball under various conditions.

Steffen is a Lecturer (Assistant Prof.) at the department of Mathematics and Statistics at Lancaster University (UK).
His research interests are at the moment threefold. Firstly, he is interested in non-parametric statistics under minimal assumptions. For example, in non-linear regression a typical assumption is that the mean of the data lies in a reproducing kernel Hilbert space (RKHS) \mathcal{H}. This is a strong assumption that can be weakened with the help of interpolation spaces that define a scale of function spaces which interpolate between \mathcal{H} and L^2. A weaker assumption is now to assume that the mean lies in some intermediate space between \mathcal{H} and L^2, and it is known in which of these interpolation spaces it lies. This assumption is significantly weaker since the intermediate spaces are significantly larger than \mathcal{H}.  This last assumption can often be relaxed further with the help of so called adaptive estimators that do not assume knowledge of the interpolation space in which the mean lies, but achieve rates of convergence similar to estimators which assume knowledge of the interpolation space. In (Page & Grünewälder, JMLR, 2019 and Page & Grünewälder, Bernoulli, 2021) these ideas are explored in the context of a norm-constraint kernel estimator.  (Grünewälder, AIStats, 2018) uses the empirical measure to define plug-in estimators for conditional expectations & probabilities and analyses their rates of convergence. This is again a non-parameteric approach relying on VC- and empirical process theory. The estimators are easy to compute (in O(n)) and achieve strong statistical guarantees.
Secondly, he is interested in the link between non-parametric statistics and optimisation to develop sound non-parametric methods that scale to large-scale datasets. Classical RKHS estimators do not scale well with data (run-time costs are often between O(n^2) and O(n^3)).  Recently, there has been significant progress to improve the scaling of RKHS methods using simple partitioning ideas (divide and conquer algorithms).  (Grünewälder,JMLR, 2018) explores another approach to large-scale regression using conditional gradient algorithms that achieve results en-par with state-of-the-art divide and conquer algorithms.
Thirdly, Steffen is interested in online decision making in the form of the multi-armed bandit problem. In particular, in forms of the multi-armed bandit problem that are well adapted to practical problems. In (Grünewälder & Khaleghi, JMLR, 2019) the restless-bandit problem is explored from a mixing point of view. Here, pay-offs are dependent over time. This form of the bandit problem is significantly harder to control than the standard setting, but it is also significantly better suited for modelling real-world problems. Other recent works explore how bandit algorithms can be applied to the stochastic knapsack problem (Pike-Burke & Grünewälder, AIStats, 2017) and  the bandit problem where the feedback is delayed and anonymous (C. Pike-Burke, S. Agrawal, C. Szepesvári & S. Grünewälder, ICML 2018).

Host: Prof. Paul Schneider