Statistical Learning & Inference Seminars

The seminars will take place fortnightly on Tuesdays 11am or 1pm.

Term 3, 23-24

Date, Time and Room	Speaker	Title
30/04, 11am, MS.02	Oliver Feng (Bath)	Optimal convex M-estimation via score matching
Abstract:	In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal. This is joint work with Yu-Chun Kao, Min Xu and Richard Samworth.
14/05, 11am, MS.01	Gilles Stupfler (University of Angers)	Some new perspectives on extremal regression
Abstract:	The objective of extremal regression is to estimate and infer quantities describing the tail of a conditional distribution. Examples of such quantities include quantiles and expectiles, and the regression version of the Expected Shortfall. Traditional regression estimators at the tails typically suffer from instability and inconsistency due to data sparseness, especially when the underlying conditional distributions are heavy-tailed. Existing approaches to extremal regression in the heavy-tailed case fall into two main categories: linear quantile regression approaches and, at the opposite, nonparametric approaches. They are also typically restricted to i.i.d. data-generating processes. I will here give an overview of a recent series of papers that discuss extremal regression methods in location-scale regression models (containing linear regression quantile models) and nonparametric regression models. Some key novel results include a general toolbox for extreme value estimation in the presence of random errors and joint asymptotic normality results for nonparametric extreme conditional quantile estimators constructed upon strongly mixing data. Joint work with A. Daouia, S. Girard, M. Oesting and A. Usseglio-Carleve.
04/06, 11am, MB0.07	Rebecca Lewis (Oxford)	High-dimensional logistic regression with separated data
Abstract:	In a logistic regression model with separated data, the log-likelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient always produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. In a high-dimensional regime, we consider the implications of enforcing a natural constraint on the vector of logistic-transformed probabilities. We derive a consistent estimator of the unknown logistic regression parameter that exists even when the data are separable.
18/06, 1pm, MB0.07	Jenny Wadsworth (Lancaster)	Geometric approaches to statistics of multivariate extremes
Abstract:	A geometric representation for multivariate extremes, based on the shapes of scaled sample clouds in light-tailed margins and their so-called limit sets, has recently been shown to connect several existing extremal dependence concepts. However, these results are purely probabilistic, and the geometric approach itself has not been fully exploited for statistical inference. We outline a method for parametric estimation of the limit set shape, which includes a useful non-/semi-parametric estimate as a pre-processing step. More fundamentally, our approach provides a new class of asymptotically motivated statistical models for the tails of multivariate distributions, and such models can accommodate any combination of simultaneous or non-simultaneous extremes through appropriate parametric forms for the limit set shape. In this talk we will also present ongoing work moving towards semiparametric methodology for greater flexibility. Extrapolation further into the tail of the distribution is possible via simulation from the fitted model, and probability estimates are possible in regions where other frameworks struggle. Joint work with Ryan Campbell.
25/06, 11am, MB0.07	Nicola Gnecco (UCL)	Extremal Random Forests
Abstract:	Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.