Statistical Learning & Inference Seminars

The seminars will take place every Tuesdays 11am.

Term 3, 24-25

Date, Time and Room

Speaker

Title

29/04, 11am, MS0.02

Housen Li (Georg-August-Universität Göttingen)

Change Point Analysis in High Dimensional Linear Models: Computational and Statistical Efficiency

Abstract:

In high-dimensional linear regression, data segmentation often relies on the assumption that regression parameters are sparse within each segment. This assumption underpins many existing methods, which typically employ ℓ₁-regularized estimation followed by contrast-based change point detection. In this talk, we will challenge this conventional wisdom by showing that neither the sparsity of the regression parameters nor that of their differences (i.e., differential parameters) is essential for consistent detection of multiple change points. Instead, we will introduce a remarkably simple yet powerful alternative: a covariance-based scan that detects structural breaks by identifying large local discrepancies in the covariance between regressors and the response. This approach not only simplifies computation but also improves statistical efficiency.

Beyond segmentation, we will present a suite of tools for post-segmentation inference on the differential parameters—methods that remain valid even when the regression vectors themselves are not sparse. Theoretical guarantees are established under minimal assumptions, accommodating non-Gaussian errors, temporal dependence, and ultra-high dimensionality.

We will conclude with numerical results from both simulated and macroeconomic datasets, demonstrating the practical advantages of our methodology. This is a joint work with Haeran Cho (Uni-Bristol) and Tobias Kley (Uni-Goettingen).

13/05, 11am, MS0.02

Sandipan Roy (University of Bath)

TBA

Abstract:

TBA

20/05, 11am, MS0.02

Ricardo Silva (UCL)

TBA

Abstract:

TBA

27/05, 11am, MS0.02

Heather Battey (Imperial College London)

Abstract:

03/06, 11am, MS0.02

Vasiliki Koutra (King's College London)

TBA

Abstract:

TBA

10/06, 11am, MS0.02

Dean Bodenham (Imperial College London)

TBA

Abstract:

TBA

17/06, 11am, MS0.02

Peter Orbanz (UCL)

Abstract:

24/06, 11am, MS0.02

Kolyan Ray (Imperial College London)

Abstract:

Term 2, 24-25

Date, Time and Room	Speaker	Title
07/01, 11am, MB0.07	Dr. Francesca Panero (Sapienza University)Link opens in a new window	Modelling sparse latent space networks with Bayesian nonparametrics
Abstract:	The graphex is a statistical framework to model random graphs. It is particularly flexible, in that it allows us to describe dense and sparse networks, different degree distributions (power-law included) and positive clustering. After introducing the general graphex framework and its asymptotic properties, I will explain how we use it to generate networks embedded in a latent space. I will present the inference algorithm and show on real data how the model helps us explaining the structures underlying commuting patterns.
14/01, 11am, MB2.22	Dr. Stratis Limnios (Alan Turing Institute)Link opens in a new window	Large Graph Generation Problematics and Frameworks
Abstract:	Synthetic data generation has gained significant traction in machine learning and statistics, with applications spanning image and prose generation, data augmentation, privacy protection, and bias mitigation. Synthetic graphs, which model complex data sets as networks, are particularly valuable for tasks like social interaction modelling, chemical compound design, and transaction forecasting. Traditional random graph generative models, such as Erdős-Rényi graphs and stochastic block models offer foundational frameworks but often fail to capture the complex structures and dependencies observed in real-world networks. In recent years, deep learning-based approaches have emerged to address these limitations, enabling the generation of more intricate and realistic synthetic graphs. Autoregressive models like GraphRNN and GRAN employ sequential node and edge generation processes, incorporating graph-based attention mechanisms to model long-term dependencies. Autoencoder-based approaches, such as GraphVAE, and adversarial models have also demonstrated potential in improving graph generation quality. Moreover, capitalising on the success of denoising diffusion models in image generation, researchers have adapted these techniques to graph learning. Models like GeoDiff have shown promise in molecular generation and chemical compound design, while more versatile approaches such as DiGress—a discrete denoising diffusion graph model—achieve high-quality graph generation with node and edge attributes. In this seminar we will present three frameworks developed to enhance scalability and adapt existing frameworks to handle large graphs with attributes and temporal features. Namely CTWalk, a GAN based temporal graph generator exploring attributed temporal random walks. L2G2G, a Graph auto-encoder (GAE) framework using a divide and conquer strategy to learn local node embeddings while improving considerably the scalability of the underlying GAE. Finally, SaGess, a denoising diffusion probabilistic leveraging the capabilities and performances of DiGress to learn and generate single large graphs.
21/01, 11am, MB2.22	Prof. Ioanna Manolopoulou (UCL)Link opens in a new window	Combining confounded and unconfounded in heterogeneous treatment effect modelling
Abstract:	Building statistical models using non-randomly sampled data is a well-known challenge in statistics, and is especially challenging when any part of the statistical model is not fully identifiable. In causal inference, and in particular in the estimation of heterogeneous treatment effects, this arises when observational data are used which may be affected by unobserved confounding. One approach to correct for such confounding is to combine data with and without unobserved confounding. However, when the unconfounded data are not representative of the whole population, the effect of de-confounding will be poor for subsets of the population that fall outside the range of these data. Depending on the structure of the model and the nature of the prior distributions used within a Bayesian model, this will be addressed by borrowing information from other parts of the space. In this work, we highlight the importance of building models that can account for uncertainty due to unobserved confounding in regions where no de-confounding is possible. To this end, we embed a combination of data with and without unobserved confounding into Bayesian Causal Forests (BCF), and make use of a data-dependent tempered likelihood to harness as much reliable information from the unconfounded data as possible, without leading to over-confidence in regions of poor identifiability. We implement our methods on a set of simulated and real data examples.
28/01, 11am, MB2.22	Dr. Gaetano Romano (Lancaster University)Link opens in a new window	An Introduction to Fast Online Changepoint Detection in the Exponential Family of Models
Abstract:	Online changepoint detection algorithms that are based on likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time n, it involves considering O(n) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to O(log(n)). This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time n to approximately log(n). In an extension (ex-focus), we showed that if one wishes to perform the likelihood ratio test for a different exponential family model, then the same pruning rule can be used, reducing again the set of locations to approximately log(n). This is achieved by exploiting a mathematical link between changepoints and convex hulls of random walks. More surprisingly, in a further extension, we found that this link translates to even to higher dimensions (md-focus), allowing for a fast optimization in the multidimensional case through well-established computational geometry algorithms. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average up to 5-dimentional sequences without any approximations.
04/02, 11am, MB2.22	Prof. Chris Nemeth (Lancaster University)Link opens in a new window	Optimisation and Sampling without Learning Rates
Abstract:	Particle-based variational inference (ParVI) methods, such as Stein variational gradient descent (SVGD), have emerged as powerful tools for scalable Bayesian inference. However, their practical performance is often sensitive to hyperparameters such as the learning rate, requiring careful tuning to ensure convergence and efficiency. In this talk, I will introduce a new class of ParVI algorithms that leverage coin betting, a framework for adaptive optimisation that eliminates the need for manual learning rate selection. These methods inherit the flexibility and scalability of ParVI while being entirely learning-rate free, making them particularly well-suited for high-dimensional and complex inference tasks. I will present some theoretical insights into their convergence properties and showcase empirical results demonstrating their effectiveness across a range of Bayesian models, highlighting their potential as robust alternatives to conventional ParVI techniques.
Monday, 10/02, 11am, MB2.22	Chengchun Shi (LSE)	ARMA-design: Optimal Design for A/B Testing in Two-sided Marketplances
Abstract:	Online experiments are frequently employed in many technological companies to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. In many applications, the experimental units receive a sequence of treatments over time. To handle these time-dependent settings, existing A/B testing solutions typically assume a fully observable experimental environment that satisfies the Markov condition. However, this assumption often does not hold in practice. This paper studies the optimal design for A/B testing in partially observable online experiments. We introduce a controlled (vector) autoregressive moving average model to capture partial observability. We introduce a small signal asymptotic framework to simplify the calculation of asymptotic mean squared errors of average treatment effect estimators under various designs. We develop two algorithms to estimate the optimal design: one utilizing constrained optimization and the other employing reinforcement learning. We demonstrate the superior performance of our designs using two dispatch simulators that realistically mimic the behaviors of drivers and passengers to create virtual environments, along with two real datasets from a ride-sharing company. A Python implementation of our proposal is available athttps://github.com/datake/ARMADesign.
18/02, 11am, MB2.22	Stijn Vansteelandt (Ghent University)	Assumption-Lean (Causal) Modeling
Abstract:	Traditional inference in (semi-)parametric models, such as generalized linear models, assumes that models are correctly specified and pre-determined. However, this approach is increasingly inadequate because models are often adaptively selected based on the data, introducing unacknowledged uncertainty. Furthermore, since models rarely represent a true underlying mechanism, standard inference is prone to bias from model misspecification; this is especially a concern in causal modeling, where even small degrees of misspecification in the range of the observed data can give rise to large biases. Recent advances in debiased machine learning and targeted learning have addressed these issues by reducing reliance on correct model specification. However, their model-free nature can limit their applicability and the insight they can deliver in complex settings. Assumption-lean modeling rethinks the trade-off between model correctness, parsimony, and interpretability. It begins with data-adaptive outcome predictions, which are then projected onto specific model parameters. This projection is designed to ensure that the parameters remain interpretable or meaningful, even under model misspecification. By incorporating debiased machine learning techniques, assumption-lean modeling minimizes bias, maximizes interpretability, and provides valid confidence intervals that account for both model uncertainty and model misspecification. In this talk, I will introduce the principles of assumption-lean modeling, focusing on its application to generalized linear models for accessibility. The presentation will draw on the work of Vansteelandt and Dukes (2022) that was presented in a discussion paper for the Journal of the Royal Statistical Society: Series B. I will discuss the relevance for causal inference, consider limitations of the original proposal and showcase recent advancements aimed at balancing efficiency with interpretability. References: Vansteelandt, S. (2021). Statistical Modelling in the Age of Data Science. Observational Studies, 7(1), 217-228. Vansteelandt, S and Dukes, O. (2022) Assumption-lean inference for generalised linear model parameters (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(3), 657– 685. Vansteelandt, S., Dukes, O., Van Lancker, K., & Martinussen, T. (2024). Assumption-lean Cox regression. Journal of the American Statistical Association, 119(545), 475-484.
25/02, 11am, MB2.22	Javier Rubio (UCL)	Dynamic survival analysis: modelling the hazard function via ordinary differential equations
Abstract:	The hazard function represents one of the main quantities of interest in the analysis of survival data. I will present a general approach for modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over time. Our proposal capitalises on the extensive literature of ODEs which, in particular, allow for establishing basic rules or laws on the dynamics of the hazard function via the use of autonomous ODEs. We show how to implement the proposed modelling framework in cases where there is an analytic solution to the system of ODEs or where an ODE solver is required to obtain a numerical solution. Although I will focus on the use of a Bayesian modelling approach, the proposed methodology can also be coupled with maximum likelihood estimation. I will present a case study using real data to illustrate the use of the proposed approach and to highlight the interpretability of the corresponding models. I will conclude with a discussion on potential extensions of our work and strategies to include covariates into our framework.
04/03, 11am, F25a (Milburn House)	Patrick Rubin-Delanchy (Edinburgh)	What makes a good embedding?
Abstract:	Embeddings are continuous vector representations of entities, such as words or nodes, which have been hugely impactful in powering large language models and can be seen to be at the heart of many deep learning algorithms. Many fundamental questions about AI might come down to understanding how information can be represented through embedding. In particular: what makes a good embedding? How does it relate the manifold hypothesis? How can we use embeddings within a statistical analysis, e.g. for a scientific problem? Using spectral embedding as a model for more complex techniques, I will present some new insights into these questions, with motivating problems in science, security, and recent work with Southmead hospital at Bristol.
11/03, 11am, MB2.22	Katarzyna Reluga (Bristol)	Multi-target semi-supervised learning with application to survey samples
Abstract:	In the classical single-target semi-supervised setting, one has access to (i) a moderately sized labelled dataset containing both response values and associated features, and (ii) a much larger unlabelled dataset where only the covariates are observed. Semi-supervised learning naturally arises in scenarios where collecting unlabelled data is straightforward for a large cohort, but acquiring corresponding labels is expensive or time-consuming. Common examples include electronic health records and survey data, where only a small subset of the population is fully observed. We extend this framework to multi-target semi-supervised learning, where labelled data to estimate multiple target parameters is scarce, and classical semi-supervised methods lead to excessive variability. We discuss the challenges of this setup and propose novel estimation methods for target parameters at the subpopulation level.

Term 1, 24-25

Date, Time and Room

Speaker

Title

01/10, 11am, MB2.22

Dr. Cornelius Fritz (Trinity College Dublin)

A Regression Framework for Studying Relationships among Attributes under Network Interference

Abstract:

To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021.

08/10, 11am, MB2.22

Dr. Francesco Sanna Passino (Imperial College London)

Low-rank models for dynamic multiplex graphs and multivariate time series

Abstract:

This talk discusses low-rank models for two different types of data structures: dynamic multiplex graphs and panels of multivariate time series. The first part of the talk will present a doubly unfolded adjacency spectral embedding (DUASE) method for networks evolving over time, with different edge types, commonly known as multiplex networks. Statistical properties of DUASE will be discussed, and links with commonly used statistical models for clustering graphs will be presented. The second part of the talk will cover the case of a panel of multivariate time series where there is co-movement between the panel components, modelled via a vector autoregressive process. A Network Informed Restricted Vector Auto-Regressive (NIRVAR) process is proposed, with an algorithm that gives a low dimensional latent embedding of each component of the panel. Clustering in this latent space is then used to recover the non-zero entries of the VAR coefficient matrix. The proposed model outperforms alternative approaches in terms of prediction and inference in simulation studies and real-data examples in applications in finance and transportation systems.

15/10, 11am, MB2.22

Dr. Tom Rainforth (Oxford)

Modern Bayesian Experimental Design

Abstract:

Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this talk, I will outline how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively, before discussing some key areas for future development in the field. Related review paper: https://arxiv.org/abs/2302.14545

22/10, 11am, MB2.22

Dr. Robin Mitra (UCL)

Using saturated count models for user-friendly synthesis of categorical data

Abstract:

Synthetic data methods are being increasingly used to protect data confidentiality. Large sparse categorical data sets pose some significant challenges for synthesis which makes many traditional methods unsuitable. We explore using saturated count models for synthesis. These are appealing as they allow large categorical data sets to be synthesized quickly and conveniently, as well as permitting risk and utility metrics to be satisfied a priori, that is, prior to synthetic data generation. Most well-known count models for synthesizing categorical data at the tabular level tend to utilise either Poisson or Poisson-mixture distributions. However, the latter are always over-dispersed, with a variance that is an increasing function of the mean. As a result, relatively more noise is applied to larger counts than smaller counts. But this is contrary to the objective of data synthesis, where larger counts are typically lower risk than smaller counts, and therefore require less perturbation. We thus additionally explore the benefits of using the discretized gamma family distribution (DGAF) for synthesis within the saturated model framework. The DGAF provides the synthesizer with control of the variance-mean relationship, allowing smaller counts to be over-dispersed and larger counts to be under-dispersed, which in turn produces synthetic data with greater utility. The benefits of the DGAF are illustrated empirically using a database which can be viewed as a good substitute to the English School Census.

29/10, 11am, MB2.22

Dr. Song Liu (Bristol)

High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching

Abstract:

This paper addresses differential inference in time-varying parametric probabilistic models, like graphical models with changing structures. Instead of estimating a high-dimensional model at each time and inferring changes later, we directly learn the differential parameter, i.e., the time derivative of the parameter. The main idea is treating the time score function of an exponential family model as a linear model of the differential parameter for direct estimation. We use time score matching to estimate parameter derivatives. We prove the consistency of a regularized score matching objective and demonstrate the finite-sample normality of a debiased estimator in high-dimensional settings. Our methodology effectively infers differential structures in high-dimensional graphical models, verified on simulated and real-world datasets.

05/11, 11am, MB2.22

Dr. Yining Chen (LSE)

Detecting Changes in Production Frontier

Abstract:

In this talk, we first give a brief review of the nonparametric estimation problem of production frontier function, which concerns the maximum possible output given input levels and the efficiency of the firms. We then look at how (potentially) multiple changes over time in the production frontier can be detected. By assuming that the frontier always shifts upwards over time, which is plausible thanks to the advance in technologies, we can detect changes in the frontier at the near-optimal rate under regularity conditions, irrelevant of the dimensionality of the input. This can be achieved by modifying and utilising the well-known Free Disposal Hull (FDH) or Data Envelopment Analysis (DEA) algorithm in different ways, depending on whether the shift is global or local. Finally, we also discuss how the confidence intervals can be constructed in this setup.

12/11, 11am, MB2.22

Dr. Ed Cohen (Imperial College London)

Analysing spatial point patterns on the surface of 3D shapes

Abstract:

Statistical methodology for analysing spatial point patterns has traditionally focused on Euclidean data and planar surfaces. However, with recent advances in 3D biological imaging technologies targeting protein molecules on a cell’s plasma membrane, spatial point patterns are now being observed on complex shapes and manifolds whose geometry must be respected for principled inference. Consequently, there is now a demand for tools that can analyse these data for important scientific studies in cellular and micro-biology. Motivated by studying the spatial distribution of LPS proteins on the surface of E-Coli, we develop the fundamental functional summary statistics for the analysis of point patterns to general convex bounded shapes and demonstrate how they can be used to test for complete spatial randomness. We then develop their multi-type extensions, together with a test for independence of the component marginal processes. To support these methods, we introduce a plug-in estimator for the intensity of a spatial point process on a manifold. We conclude with a discussion on how these methods can readily be extended to a class of non-convex shapes.

References:

S. Ward, E.A.K. Cohen, N. M. Adams. Testing for complete spatial randomness on 3-dimensional bounded convex shapes. Spatial Statistics, Vol. 41, 2021.

S. Ward, H. S. Battey and E. A. K. Cohen. Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold. Biometrika, Vol. 110, 2023.

S. Ward, E. A. K. Cohen and N. M. Adams. Functional summary statistics and testing for independence in marked point patterns on the surface of three-dimensional convex shapes. arXiv:2410.01063, 2024.

19/11, 11am, MB2.22

Prof. Thomas Nichols (Oxford)

Scalable Longitudinal Models for Neuroimaging Data

Abstract:

Neuroimaging has mainly depended on crossectional data to study the brain through the lifespan, but these studies can only attribute variation to intersubject differences in age. Only longitudinal neuroimaging studies can infer on age-induced changes in the brain, crucial for studies of brain developmental and aging. I will review a range of methods my group has developed for the analysis of longitudinal neuroimaging data, work started when I was Warwick faculty 10+ years ago and continued recently with collaborations with existing Warwick Stats faculty. First I'll review a very fast and practical approach using marginal models and robust standard errors, for which we have created a user-friendly implementation (SwE). Next, while we'd ideally use standard linear mixed effects (LME) implementations (e.g. lme4/nlme), they aren't practical with brain data as they can only fit data one voxel's at a time and thus can exploit vectorised computation. We have developed a highly optimised linear mixed effects (LME) implementation that exploits vectorised computation so that all voxels are simultaneously updated at each iteration (BigLMM). Finally, for longitudinal binary images, e.g. lesion masks of white matter hyperintensities, we propose a relative-risk regression to support user's preference for relative risk (RR) units instead of odds-ratios. We use a GEE approach with log-link and identity variance function and unknown dispersion parameter along with including a penalty to avoid infinite parameter estimates (joint work with Ioannis Kosmidis). This suite of work is a small indication of the rich methodological opportunities for the growing body of longitudinal neuroimaging studies.

26/11, 11am, MB2.22

Prof. Qiwei Yao (LSE)

Identification and Estimation for Matrix Time Series CP-factor Models

Abstract:

We investigate the identification and the estimation for matrix time series CP-factor models. Unlike the generalized eigen analysis-based method of Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the newly proposed estimation can handle rank-deficient factor loading matrices. The estimation procedure consists of the spectral decomposition of several matrices and a matrix joint diagonalization algorithm, resulting in low computational cost. The theoretical guarantee established without the stationarity assumption shows that the proposed estimation exhibits a faster convergence rate than that of Chang et al. (2023). In fact the new estimator is free from the adverse impact of any eigen-gaps, unlike most eigenanalysis-based methods. Furthermore, in terms of the error rates of the estimation, the proposed procedure is equivalent to handling a vector time series of dimension max(p, q) instead of pxq, where (p, q) are the dimensions of the matrix time series concerned. We have achieved this without assuming the “near orthogonality” of the loadings under various incoherence conditions often imposed in the CP-decomposition literature. Illustration with both simulated and real matrix time series data shows the usefulness of the proposed approach.

Joint work with Jianyuan Chang, Yue Du and Guanglin Huang

03/12, 11am, MB2.22

Prof. Karthik Bharath (Nottingham)

Rolled Gaussian process models for curves on manifolds

Abstract:

Curves on manifolds arise as data in various applications, but practical probabilistic models for their analysis are presently unavailable. One strategy is to flatten or linearise the manifold $M$, then exploit the flattened space for modelling. But, in the absence of global coordinates on $M$ , how the flattening is done is crucial because it may induce severe distortions. A local flattening strategy based on rolling $M$ without slipping along a Euclidean curve, that is compatible with the intrinsic geometry of $M$, will be discussed. The strategy allows for the prescription of a Gaussian process-type model on $M$. Theoretical and computational challenges in estimation of and inference for parameters of the model, and their relationship to Frechet means on $M$, using discretely observed curves will be discussed, aided by an application in robot learning.

12/12 (Thursday), 11am, MB2.22

Dr. Minwoo Chae (Pohang University of Science and Technology)

Nonparametric estimation of a factorizable density using diffusion models

Abstract:

In recent years, diffusion-based deep generative models have achieved remarkable success in various applications. In this talk, we present statistical theories for diffusion models within the framework of nonparametric structured density estimation. To address the curse of dimensionality in nonparametric density estimation, we assume that the underlying density function factorizes into several low-dimensional components. Such factorizable densities are common in important examples, such as Bayesian networks and Markov random fields. We prove that an implicit density estimator constructed from diffusion models achieves the minimax optimal convergence rate with respect to total variation. Technically, we design a novel network architecture, which includes convolutional neural networks as a special case, to construct a minimax optimal estimator.

Term 3, 23-24

Date, Time and Room	Speaker	Title
30/04, 11am, MS.02	Oliver Feng (Bath)	Optimal convex M-estimation via score matching
Abstract:	In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal. This is joint work with Yu-Chun Kao, Min Xu and Richard Samworth.
14/05, 11am, MS.01	Gilles Stupfler (University of Angers)	Some new perspectives on extremal regression
Abstract:	The objective of extremal regression is to estimate and infer quantities describing the tail of a conditional distribution. Examples of such quantities include quantiles and expectiles, and the regression version of the Expected Shortfall. Traditional regression estimators at the tails typically suffer from instability and inconsistency due to data sparseness, especially when the underlying conditional distributions are heavy-tailed. Existing approaches to extremal regression in the heavy-tailed case fall into two main categories: linear quantile regression approaches and, at the opposite, nonparametric approaches. They are also typically restricted to i.i.d. data-generating processes. I will here give an overview of a recent series of papers that discuss extremal regression methods in location-scale regression models (containing linear regression quantile models) and nonparametric regression models. Some key novel results include a general toolbox for extreme value estimation in the presence of random errors and joint asymptotic normality results for nonparametric extreme conditional quantile estimators constructed upon strongly mixing data. Joint work with A. Daouia, S. Girard, M. Oesting and A. Usseglio-Carleve.
04/06, 11am, MB0.07	Rebecca Lewis (Oxford)	High-dimensional logistic regression with separated data
Abstract:	In a logistic regression model with separated data, the log-likelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient always produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. In a high-dimensional regime, we consider the implications of enforcing a natural constraint on the vector of logistic-transformed probabilities. We derive a consistent estimator of the unknown logistic regression parameter that exists even when the data are separable.
18/06, 1pm, MB0.07	Jenny Wadsworth (Lancaster)	Geometric approaches to statistics of multivariate extremes
Abstract:	A geometric representation for multivariate extremes, based on the shapes of scaled sample clouds in light-tailed margins and their so-called limit sets, has recently been shown to connect several existing extremal dependence concepts. However, these results are purely probabilistic, and the geometric approach itself has not been fully exploited for statistical inference. We outline a method for parametric estimation of the limit set shape, which includes a useful non-/semi-parametric estimate as a pre-processing step. More fundamentally, our approach provides a new class of asymptotically motivated statistical models for the tails of multivariate distributions, and such models can accommodate any combination of simultaneous or non-simultaneous extremes through appropriate parametric forms for the limit set shape. In this talk we will also present ongoing work moving towards semiparametric methodology for greater flexibility. Extrapolation further into the tail of the distribution is possible via simulation from the fitted model, and probability estimates are possible in regions where other frameworks struggle. Joint work with Ryan Campbell.
25/06, 11am, MB0.07	Nicola Gnecco (UCL)	Extremal Random Forests
Abstract:	Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.