Skip to main content Skip to navigation

Statistical Learning & Inference Seminars

The seminars will take place every Tuesdays 11am.

Term 2, 24-25

Date, Time and Room Speaker Title

07/01, 11am, MB0.07

Dr. Francesca Panero (Sapienza University)Link opens in a new window

Modelling sparse latent space networks with Bayesian nonparametrics

Abstract:

The graphex is a statistical framework to model random graphs. It is particularly flexible, in that it allows us to describe dense and sparse networks, different degree distributions (power-law included) and positive clustering. After introducing the general graphex framework and its asymptotic properties, I will explain how we use it to generate networks embedded in a latent space. I will present the inference algorithm and show on real data how the model helps us explaining the structures underlying commuting patterns.

14/01, 11am, MB2.22 Dr. Stratis Limnios (Alan Turing Institute)Link opens in a new window

Large Graph Generation Problematics and Frameworks

Abstract:

Synthetic data generation has gained significant traction in machine learning and statistics, with applications spanning image and prose generation, data augmentation, privacy protection, and bias mitigation. Synthetic graphs, which model complex data sets as networks, are particularly valuable for tasks like social interaction modelling, chemical compound design, and transaction forecasting. Traditional random graph generative models, such as Erdős-Rényi graphs and stochastic block models offer foundational frameworks but often fail to capture the complex structures and dependencies observed in real-world networks.

In recent years, deep learning-based approaches have emerged to address these limitations, enabling the generation of more intricate and realistic synthetic graphs. Autoregressive models like GraphRNN and GRAN employ sequential node and edge generation processes, incorporating graph-based attention mechanisms to model long-term dependencies. Autoencoder-based approaches, such as GraphVAE, and adversarial models have also demonstrated potential in improving graph generation quality. Moreover, capitalising on the success of denoising diffusion models in image generation, researchers have adapted these techniques to graph learning. Models like GeoDiff have shown promise in molecular generation and chemical compound design, while more versatile approaches such as DiGress—a discrete denoising diffusion graph model—achieve high-quality graph generation with node and edge attributes.

In this seminar we will present three frameworks developed to enhance scalability and adapt existing frameworks to handle large graphs with attributes and temporal features. Namely CTWalk, a GAN based temporal graph generator exploring attributed temporal random walks. L2G2G, a Graph auto-encoder (GAE) framework using a divide and conquer strategy to learn local node embeddings while improving considerably the scalability of the underlying GAE. Finally, SaGess, a denoising diffusion probabilistic leveraging the capabilities and performances of DiGress to learn and generate single large graphs.

21/01, 11am, MB2.22 Prof. Ioanna Manolopoulou (UCL)Link opens in a new window

Combining confounded and unconfounded in heterogeneous treatment effect modelling

Abstract:

Building statistical models using non-randomly sampled data is a well-known challenge in statistics, and is especially challenging when any part of the statistical model is not fully identifiable. In causal inference, and in particular in the estimation of heterogeneous treatment effects, this arises when observational data are used which may be affected by unobserved confounding. One approach to correct for such confounding is to combine data with and without unobserved confounding. However, when the unconfounded data are not representative of the whole population, the effect of de-confounding will be poor for subsets of the population that fall outside the range of these data. Depending on the structure of the model and the nature of the prior distributions used within a Bayesian model, this will be addressed by borrowing information from other parts of the space. In this work, we highlight the importance of building models that can account for uncertainty due to unobserved confounding in regions where no de-confounding is possible. To this end, we embed a combination of data with and without unobserved confounding into Bayesian Causal Forests (BCF), and make use of a data-dependent tempered likelihood to harness as much reliable information from the unconfounded data as possible, without leading to over-confidence in regions of poor identifiability. We implement our methods on a set of simulated and real data examples.

28/01, 11am, MB2.22 Dr. Gaetano Romano (Lancaster University)Link opens in a new window

An Introduction to Fast Online Changepoint Detection in the Exponential Family of Models

Abstract:
Online changepoint detection algorithms that are based on likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time n, it involves considering O(n) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to O(log(n)). This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time n to approximately log(n).
In an extension (ex-focus), we showed that if one wishes to perform the likelihood ratio test for a different exponential family model, then the same pruning rule can be used, reducing again the set of locations to approximately log(n). This is achieved by exploiting a mathematical link between changepoints and convex hulls of random walks. More surprisingly, in a further extension, we found that this link translates to even to higher dimensions (md-focus), allowing for a fast optimization in the multidimensional case through well-established computational geometry algorithms. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average up to 5-dimentional sequences without any approximations.
04/02, 11am, MB2.22 Prof. Chris Nemeth (Lancaster University)Link opens in a new window TBC
Abstract: TBC

Monday, 10/02, 11am, MB2.22

TBC  
Abstract:    

18/02, 11am, MB2.22

TBC  
Abstract:    

25/02, 11am, MB2.22

TBC  
Abstract:    
04/03, 11am,

F25a (Milburn House)

Patrick Rubin-Delanchy  
Abstract:    
11/03, 11am, MB2.22 Katarzyna Reluga  
Abstract:    

Term 1, 24-25

Date, Time and Room

Speaker

Title

01/10, 11am, MB2.22

Dr. Cornelius Fritz (Trinity College Dublin) A Regression Framework for Studying Relationships among Attributes under Network Interference

Abstract:

To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021.

08/10, 11am, MB2.22

Dr. Francesco Sanna Passino (Imperial College London)

Low-rank models for dynamic multiplex graphs and multivariate time series

Abstract:

This talk discusses low-rank models for two different types of data structures: dynamic multiplex graphs and panels of multivariate time series. The first part of the talk will present a doubly unfolded adjacency spectral embedding (DUASE) method for networks evolving over time, with different edge types, commonly known as multiplex networks. Statistical properties of DUASE will be discussed, and links with commonly used statistical models for clustering graphs will be presented. The second part of the talk will cover the case of a panel of multivariate time series where there is co-movement between the panel components, modelled via a vector autoregressive process. A Network Informed Restricted Vector Auto-Regressive (NIRVAR) process is proposed, with an algorithm that gives a low dimensional latent embedding of each component of the panel. Clustering in this latent space is then used to recover the non-zero entries of the VAR coefficient matrix. The proposed model outperforms alternative approaches in terms of prediction and inference in simulation studies and real-data examples in applications in finance and transportation systems.

15/10, 11am, MB2.22

Dr. Tom Rainforth (Oxford) Modern Bayesian Experimental Design

Abstract:

Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this talk, I will outline how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively, before discussing some key areas for future development in the field. Related review paper: https://arxiv.org/abs/2302.14545

22/10, 11am, MB2.22

Dr. Robin Mitra (UCL) Using saturated count models for user-friendly synthesis of categorical data

Abstract:

Synthetic data methods are being increasingly used to protect data confidentiality. Large sparse categorical data sets pose some significant challenges for synthesis which makes many traditional methods unsuitable. We explore using saturated count models for synthesis. These are appealing as they allow large categorical data sets to be synthesized quickly and conveniently, as well as permitting risk and utility metrics to be satisfied a priori, that is, prior to synthetic data generation. Most well-known count models for synthesizing categorical data at the tabular level tend to utilise either Poisson or Poisson-mixture distributions. However, the latter are always over-dispersed, with a variance that is an increasing function of the mean. As a result, relatively more noise is applied to larger counts than smaller counts. But this is contrary to the objective of data synthesis, where larger counts are typically lower risk than smaller counts, and therefore require less perturbation. We thus additionally explore the benefits of using the discretized gamma family distribution (DGAF) for synthesis within the saturated model framework. The DGAF provides the synthesizer with control of the variance-mean relationship, allowing smaller counts to be over-dispersed and larger counts to be under-dispersed, which in turn produces synthetic data with greater utility. The benefits of the DGAF are illustrated empirically using a database which can be viewed as a good substitute to the English School Census.

29/10, 11am, MB2.22

Dr. Song Liu (Bristol) High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching

Abstract:

This paper addresses differential inference in time-varying parametric probabilistic models, like graphical models with changing structures. Instead of estimating a high-dimensional model at each time and inferring changes later, we directly learn the differential parameter, i.e., the time derivative of the parameter. The main idea is treating the time score function of an exponential family model as a linear model of the differential parameter for direct estimation. We use time score matching to estimate parameter derivatives. We prove the consistency of a regularized score matching objective and demonstrate the finite-sample normality of a debiased estimator in high-dimensional settings. Our methodology effectively infers differential structures in high-dimensional graphical models, verified on simulated and real-world datasets.

05/11, 11am, MB2.22

Dr. Yining Chen (LSE)

Detecting Changes in Production Frontier

Abstract:

In this talk, we first give a brief review of the nonparametric estimation problem of production frontier function, which concerns the maximum possible output given input levels and the efficiency of the firms. We then look at how (potentially) multiple changes over time in the production frontier can be detected. By assuming that the frontier always shifts upwards over time, which is plausible thanks to the advance in technologies, we can detect changes in the frontier at the near-optimal rate under regularity conditions, irrelevant of the dimensionality of the input. This can be achieved by modifying and utilising the well-known Free Disposal Hull (FDH) or Data Envelopment Analysis (DEA) algorithm in different ways, depending on whether the shift is global or local. Finally, we also discuss how the confidence intervals can be constructed in this setup.

12/11, 11am, MB2.22

Dr. Ed Cohen (Imperial College London)Link opens in a new window Analysing spatial point patterns on the surface of 3D shapes

Abstract:

Statistical methodology for analysing spatial point patterns has traditionally focused on Euclidean data and planar surfaces. However, with recent advances in 3D biological imaging technologies targeting protein molecules on a cell’s plasma membrane, spatial point patterns are now being observed on complex shapes and manifolds whose geometry must be respected for principled inference. Consequently, there is now a demand for tools that can analyse these data for important scientific studies in cellular and micro-biology. Motivated by studying the spatial distribution of LPS proteins on the surface of E-Coli, we develop the fundamental functional summary statistics for the analysis of point patterns to general convex bounded shapes and demonstrate how they can be used to test for complete spatial randomness. We then develop their multi-type extensions, together with a test for independence of the component marginal processes. To support these methods, we introduce a plug-in estimator for the intensity of a spatial point process on a manifold. We conclude with a discussion on how these methods can readily be extended to a class of non-convex shapes.

References:

S. Ward, E.A.K. Cohen, N. M. Adams. Testing for complete spatial randomness on 3-dimensional bounded convex shapes. Spatial Statistics, Vol. 41, 2021.

S. Ward, H. S. Battey and E. A. K. Cohen. Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold. Biometrika, Vol. 110, 2023.

S. Ward, E. A. K. Cohen and N. M. Adams. Functional summary statistics and testing for independence in marked point patterns on the surface of three-dimensional convex shapes. arXiv:2410.01063, 2024.

19/11, 11am, MB2.22

Prof. Thomas Nichols (Oxford) Scalable Longitudinal Models for Neuroimaging Data

Abstract:

Neuroimaging has mainly depended on crossectional data to study the brain through the lifespan, but these studies can only attribute variation to intersubject differences in age. Only longitudinal neuroimaging studies can infer on age-induced changes in the brain, crucial for studies of brain developmental and aging. I will review a range of methods my group has developed for the analysis of longitudinal neuroimaging data, work started when I was Warwick faculty 10+ years ago and continued recently with collaborations with existing Warwick Stats faculty. First I'll review a very fast and practical approach using marginal models and robust standard errors, for which we have created a user-friendly implementation (SwE). Next, while we'd ideally use standard linear mixed effects (LME) implementations (e.g. lme4/nlme), they aren't practical with brain data as they can only fit data one voxel's at a time and thus can exploit vectorised computation. We have developed a highly optimised linear mixed effects (LME) implementation that exploits vectorised computation so that all voxels are simultaneously updated at each iteration (BigLMM). Finally, for longitudinal binary images, e.g. lesion masks of white matter hyperintensities, we propose a relative-risk regression to support user's preference for relative risk (RR) units instead of odds-ratios. We use a GEE approach with log-link and identity variance function and unknown dispersion parameter along with including a penalty to avoid infinite parameter estimates (joint work with Ioannis Kosmidis). This suite of work is a small indication of the rich methodological opportunities for the growing body of longitudinal neuroimaging studies.

26/11, 11am, MB2.22

Prof. Qiwei Yao (LSE) Identification and Estimation for Matrix Time Series CP-factor Models

Abstract:

We investigate the identification and the estimation for matrix time series CP-factor models. Unlike the generalized eigen analysis-based method of Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the newly proposed estimation can handle rank-deficient factor loading matrices. The estimation procedure consists of the spectral decomposition of several matrices and a matrix joint diagonalization algorithm, resulting in low computational cost. The theoretical guarantee established without the stationarity assumption shows that the proposed estimation exhibits a faster convergence rate than that of Chang et al. (2023). In fact the new estimator is free from the adverse impact of any eigen-gaps, unlike most eigenanalysis-based methods. Furthermore, in terms of the error rates of the estimation, the proposed procedure is equivalent to handling a vector time series of dimension max(p, q) instead of pxq, where (p, q) are the dimensions of the matrix time series concerned. We have achieved this without assuming the “near orthogonality” of the loadings under various incoherence conditions often imposed in the CP-decomposition literature. Illustration with both simulated and real matrix time series data shows the usefulness of the proposed approach.
Joint work with Jianyuan Chang, Yue Du and Guanglin Huang

03/12, 11am, MB2.22

Prof. Karthik Bharath (Nottingham)

Rolled Gaussian process models for curves on manifolds

Abstract:

Curves on manifolds arise as data in various applications, but practical probabilistic models for their analysis are presently unavailable. One strategy is to flatten or linearise the manifold $M$, then exploit the flattened space for modelling. But, in the absence of global coordinates on $M$ , how the flattening is done is crucial because it may induce severe distortions. A local flattening strategy based on rolling $M$ without slipping along a Euclidean curve, that is compatible with the intrinsic geometry of $M$, will be discussed. The strategy allows for the prescription of a Gaussian process-type model on $M$. Theoretical and computational challenges in estimation of and inference for parameters of the model, and their relationship to Frechet means on $M$, using discretely observed curves will be discussed, aided by an application in robot learning.

12/12 (Thursday), 11am, MB2.22

Dr. Minwoo Chae (Pohang University of Science and Technology)

Nonparametric estimation of a factorizable density using diffusion models

Abstract:

In recent years, diffusion-based deep generative models have achieved remarkable success in various applications. In this talk, we present statistical theories for diffusion models within the framework of nonparametric structured density estimation. To address the curse of dimensionality in nonparametric density estimation, we assume that the underlying density function factorizes into several low-dimensional components. Such factorizable densities are common in important examples, such as Bayesian networks and Markov random fields. We prove that an implicit density estimator constructed from diffusion models achieves the minimax optimal convergence rate with respect to total variation. Technically, we design a novel network architecture, which includes convolutional neural networks as a special case, to construct a minimax optimal estimator.

Term 3, 23-24

Date, Time and Room

Speaker

Title
30/04, 11am, MS.02

Oliver Feng (Bath)

Optimal convex M-estimation via score matching

Abstract:

In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal. This is joint work with Yu-Chun Kao, Min Xu and Richard Samworth.

14/05, 11am, MS.01

Gilles Stupfler (University of Angers)

Some new perspectives on extremal regression

Abstract:

The objective of extremal regression is to estimate and infer quantities describing the tail of a conditional distribution. Examples of such quantities include quantiles and expectiles, and the regression version of the Expected Shortfall. Traditional regression estimators at the tails typically suffer from instability and inconsistency due to data sparseness, especially when the underlying conditional distributions are heavy-tailed. Existing approaches to extremal regression in the heavy-tailed case fall into two main categories: linear quantile regression approaches and, at the opposite, nonparametric approaches. They are also typically restricted to i.i.d. data-generating processes. I will here give an overview of a recent series of papers that discuss extremal regression methods in location-scale regression models (containing linear regression quantile models) and nonparametric regression models. Some key novel results include a general toolbox for extreme value estimation in the presence of random errors and joint asymptotic normality results for nonparametric extreme conditional quantile estimators constructed upon strongly mixing data. Joint work with A. Daouia, S. Girard, M. Oesting and A. Usseglio-Carleve.
04/06, 11am, MB0.07

Rebecca Lewis (Oxford)

High-dimensional logistic regression with separated data

Abstract:

In a logistic regression model with separated data, the log-likelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient always produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. In a high-dimensional regime, we consider the implications of enforcing a natural constraint on the vector of logistic-transformed probabilities. We derive a consistent estimator of the unknown logistic regression parameter that exists even when the data are separable.
18/06, 1pm, MB0.07

Jenny Wadsworth (Lancaster)

Geometric approaches to statistics of multivariate extremes

Abstract:

A geometric representation for multivariate extremes, based on the shapes of scaled sample clouds in light-tailed margins and their so-called limit sets, has recently been shown to connect several existing extremal dependence concepts. However, these results are purely probabilistic, and the geometric approach itself has not been fully exploited for statistical inference. We outline a method for parametric estimation of the limit set shape, which includes a useful non-/semi-parametric estimate as a pre-processing step. More fundamentally, our approach provides a new class of asymptotically motivated statistical models for the tails of multivariate distributions, and such models can accommodate any combination of simultaneous or non-simultaneous extremes through appropriate parametric forms for the limit set shape. In this talk we will also present ongoing work moving towards semiparametric methodology for greater flexibility. Extrapolation further into the tail of the distribution is possible via simulation from the fitted model, and probability estimates are possible in regions where other frameworks struggle. Joint work with Ryan Campbell.
25/06, 11am, MB0.07 Nicola Gnecco (UCL) Extremal Random Forests

Abstract:

Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.