Warwick Stats Conference - Scientific Programme 2026

Programme

22nd April - Day 1

22nd April
9.30 - 10.00	Registration and Kick-off
10.00 - 11.00	Plenary Talk Prof. Richard Samworth
11.00 - 12.00	Contributed Talks (A1-A2) Dr Andi Wang Dr Filippo Pagani
12.00 - 13.00	Lunch
13.00 - 14.00	Contributed Talks (B1-B2) Florian Gutekunst Dr Zhengang Zhong
14.00 - 14.30	Coffee Break
14.30 - 15.30	Plenary Talk Prof. Patrick Rebeschini
15.30 - 16.30	Contributed Talks (C1-C2) Dr Ibrahim Kaddouri Alexander Kent
16.30 - 17.30	Poster session
18.30 - Late	Conference Dinner

23rd April - Day 2

23rd April
10.00 - 11.00	Plenary Talk Prof. Rebecca Killick
11.00 - 12.00	Contributed Talks (D1-D2-D3) Matthew Adeoye Federico Perlino Jia Le Tan
12.00 - 13.00	Lunch
13.00 - 14.00	Plenary Talk Mark Burnett (Head of Technology Innovation, G-Research)
14.00 - 14.30	Coffee Break
14.30 - 15.30	Plenary Talk Dr Gonçalo dos Reis
15.30 - 15.40	Short break
15.40 - 16.50	ECR Interactive session
16.50 - 17.00	Closing Remarks & Farewell

All Talks and Contributed Sessions will take place in MS.05, Zeeman building.

The Poster Session will take place in the Statistics Atrium, MSB.

Plenary Speakers - Abstracts

Prof. Richard Samworth

Title: Learn the score

Abstract: Score estimation has recently emerged as a key modern statistical challenge, due to its pivotal role in generative modelling via diffusion models. Moreover, it is an essential ingredient in a new approach to linear regression via convex M-estimation, where the corresponding error densities are projected onto the log-concave class. I will outline the antitonic score matching framework that underpins this latter application, and explain its advantages over ordinary least squares, for both estimation and inference (e.g. prediction intervals). Motivated by both problems, I will then present new results on the minimax risk of score estimation over classes of log-concave densities.

Prof. Patrick Rebeschini

Title: Generalisation Error and Effective Dimensions: From Neural Networks to Diffusion Models

Abstract: We study generalisation in neural networks and diffusion models using on-average algorithmic stability. For local minimisers of empirical risk, we show that generalisation error can be bounded in terms of data-dependent effective dimensions arising from the geometry of the gradient covariance, capturing low-dimensional structure in the data. For diffusion models, we introduce a notion of score stability and derive generalisation bounds that reflect the role of training and sampling procedures. Across these settings, we study three forms of regularisation: explicit (Tikhonov regularisation), implicit (stochastic gradient descent), and sampling-based (early stopping and discretisation). We show how these control generalisation through different notions of effective dimension.

(Based on joint work with Ayub Kharel, Yasin Abbasi, and Ilja Kuzborskij; and with Tyler Farghly, George Deligiannidis, and Arnaud Doucet)

Prof. Rebecca Killick

Title: Regional and Global warming: A story of improving statistical literacy in climate science

Abstract: Global warming is always a hot topic and whatever side of the fence you sit on, you cannot deny the data that the climate is changing - regardless of attributing a cause to that. These changes have huge impacts on individual lives all over the world. In the UK we have had warmer summers and milder winters (at least in southern parts of the UK!) and more extreme weather events than earlier in the twentieth century. Most people have a storm, flooding or drought story, or know someone who has. In this talk, I will share some of my own stories about working with the data behind the headlines, mostly in temperature time series from across the globe. I will provide an introduction to the changepoint models I use to describe these temperature data and provide commentaries on working with environmental scientists and media engagement. From the statistical side, I am hoping that you will take away some insights into why changepoint models are useful, an intuition of (and some R packages for) how to apply these models to any data, and the key challenges you may come across in doing so. In littering my talk with anecdotes about interdisciplinary work, I am hoping to convey both the joys and realities of taking this approach to motivate those who haven't taken this path yet, whilst simultaneously building a shared connection with those who have.

Dr Gonçalo dos Reis

Title: Simulation of mean-field SDEs: some recent results

Abstract: We review two results in the simulation for SDE of McKean-Vlasov type (MV-SDE). The first block of results addresses simulation of MV-SDEs having super-linear growth in the spatial and the interaction component in the drift, and non-constant Lipschitz diffusion coefficient. The 2nd block is far more curious. It addresses the study the weak convergence behaviour of the Leimkuhler--Matthews method, a non-Markovian Euler-type scheme with the same computational cost as the Euler scheme, for the approximation of the stationary distribution of a one-dimensional McKean--Vlasov Stochastic Differential Equation (MV-SDE). The particular class under study is known as mean-field (overdamped) Langevin equations (MFL). We provide weak and strong error results for the scheme in both finite and infinite time. We work under a strong convexity assumption. Based on a careful analysis of the variation processes and the Kolmogorov backward equation for the particle system associated with the MV-SDE, we show that the method attains a higher-order approximation accuracy in the long-time limit (of weak order convergence rate 3/2) than the standard Euler method (of weak order 1). While we use an interacting particle system (IPS) to approximate the MV-SDE, we show the convergence rate is independent of the dimension of the IPS and this includes establishing uniform-in-time decay estimates for moments of the IPS, the Kolmogorov backward equation and their derivatives. The theoretical findings are supported by numerical tests.

This presentation is based on the joint work [1], [2].

[1] Chen, X., Dos Reis, G., Stockinger, W. and Wilde, Z., 2025. Improved weak convergence for the long-time simulation of mean-field Langevin equations. Electronic Journal of Probability, 30, pp.1-81.

[2] X. Chen, G. dos Reis. "Euler simulation of interacting particle systems and McKean-Vlasov SDEs with fully superlinear growth drifts in space and interaction" IMA Journal of Numerical Analysis, 44, no. 2 (2024), 751-796.

Mark Burnett

Mark Burnett is Head of Technology Innovation and Associate Partner at G-Research and an advisory board member of the UK GenAI Hub. G-Research is a London-based quant research and technology firm specialising in predicting financial market movements. Mark leads horizon scanning, technology scouting, and building relationships with academia, industry, spin-offs, and start-ups to connect emerging trends and promising technologies within the AI ecosystem.

The GenAI Hub brings together AI research talent from eight UK universities to collaborate on generative models that drive impact for science, industry, the economy, and society—unlocking the next generation of Generative AI technologies.

Mark has over 30 years of experience in technology and innovation, spanning software engineering, systems design, enterprise architecture, technology strategy, transformation, and innovation management. He has worked with software engineering, product, outsourcing, and consulting firms, often leading technology advisory projects for C-level executives.

Mark is passionate about empowering talented people with bright ideas to create the future. He combines leading research and technologies with deep understanding and insight to solve complex problems posed by new and disruptive technologies. He believes the future belongs to those who harness innovation to create meaningful change, bridging the gap between cutting-edge research and practical applications that benefit society.

Title: Bridging the gap: from Academic AI Research to Cutting-Edge Industrial innovation

Abstract: This talk will explore the complex journey of adopting AI technologies in industry, sharing both triumphs and failures from the frontlines of AI innovation. Drawing on experiences from G-Research's quantitative finance operations as well as examples across diverse industries, we'll examine how AI is actively transforming the nature of work and creating new career paths and new opportunities for startups and AI tech.

The presentation will cover:

Concrete examples of AI applications and their real-world impact across sectors
Breakthrough successes and instructive failures in deploying cutting-edge AI
Opportunities for new AI entrants in today's evolving job market
The future of work: emerging roles and skills shaped by AI
How AI is being leveraged to address global challenges in sustainability and human impact

The idea is not only to showcase the state of AI implementation today but also inspire thinking about where we're headed—and the opportunities that lie ahead for those entering the field.

Contributed Talks - Abstracts

A1 - Dr Andi Wang

Title: Compositional foundations of probability and statistics
Abstract: Traditionally, probability and statistics have been founded on measure theory. However, for researchers in applied probability, statistics and machine learning, it is very rare that one actually refers to explicitly-constructed probability spaces, sigma-algebras or measurability. The recent perspective of categorical probability instead begins with compositional structure, and provides a rigorous approach to reason about stochasticity, with measure-theoretic probability being an example, rather than the foundation. In this talk, I will introduce categorical probability, and give a concrete application to Markov chain Monte Carlo. I will not assume any prior knowledge of category theory. The latter part of the talk is based on joint work with Rob Cornish.

A2 - Dr Filippo Pagani

Title: A discomfort-informed adaptive Gibbs sampler for finite mixture models
Abstract: Finite mixture models are frequently used to uncover latent structures in high- dimensional datasets (e.g. identifying clusters of patients in electronic health records). The inference of such structures can be performed in a Bayesian framework, and involves the use of sampling algorithms such as Gibbs samplers aimed at deriving posterior distribution of the probabilities of observations to belong to specific clusters. Unfortunately, traditional implementations of Gibbs samplers in this context often face critical challenges, such as inefficient use of computational resources and unnecessary updates for observations that are highly likely to remain in their current cluster. This paper introduces a new adaptive Gibbs sampler that improves the convergence efficiency over existing methods. In particular, our sampler is guided by a function that, at each iteration, uses the past of the chain to focus the updating on observations potentially misclassified in the current clustering, i.e. those with a low probability of belonging to their current component. Through simulation studies and two real data analyses, we empirically demonstrate that, in terms of convergence time, our method tends to perform more efficiently compared to state-of-the-art approaches.

B1 - Florian Gutekunst

Title: Optimal Consumption in non-Markovian Stochastic Factor Models
Abstract: We study optimal investment and consumption over the infinite horizon under power utility in a non-Markovian incomplete stochastic factor model. Using the method of sub- and supersolutions, we prove the existence of a solution to an associated infinite horizon BSDE, obtain tight bounds on the optimal consumption rate, and prove a verification theorem. We apply our theory to the rough Heston model.

B2 - Dr Zhengang Zhong

Title: Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions
Abstract: Laplace learning is a semi-supervised method, a solution for finding missing labels from a partially labeled dataset utilizing the geometry given by the unlabeled data points. The method minimizes a Dirichlet energy defined on a (discrete) graph constructed from the full dataset. In finite dimensions the asymptotics in the large (unlabeled) data limit are well understood with convergence from the graph setting to a continuum Sobolev semi-norm weighted by the Lebesgue density of the data-generating measure. The lack of the Lebesgue measure on infinite-dimensional spaces requires rethinking the analysis if the data aren’t finite-dimensional. In this talk, I will introduce the first step in this direction by analyzing the setting when the data are generated by a Gaussian measure on a Hilbert space and proving pointwise convergence of the graph Dirichlet energy.

C1 - Dr Ibrahim Kaddouri

Title: Clustering risk under the slowly mixing hidden Markov model
Abstract: We study the problem of clustering under a hidden Markov model with Gaussian emissions, focusing on the regime where the hidden chain mixes slowly. We provide a precise characterization of how the Bayes risk depends on the model parameters and construct a Bayes-optimal clustering procedure. Notably, our analysis reveals surprising and non-standard behavior of the Bayes risk in certain parameter regimes, offering new insights into the interplay between signal strength and temporal dependence.

C2 - Alexander Kent

Title: Rate Optimality and Phase Transition for User-Level Local Differential Privacy
Abstract: Given demands for rigorous data privacy guarantees from both a regulatory standpoint and from the concerns of the data subjects, definitions of privacy which can be theoretically validated are of great interest. One such method enjoying significant popularity in both academia and industry is that of differential privacy in which carefully calibrated noise is added to data to provide plausible deniability as to the true value. Differential privacy appears in both the central model, where a trusted aggregator has access to the data and releases a privatised output, and the local model, where each user adds noise before publishing their (now privatised) data to a potentially untrusted aggregator. Referring to the traditional setting where each of the n data subjects hold a single data point as item-level privacy, a growing field of interest is that of user-level privacy where each of the n users holds T observations and wishes to maintain the privacy of their entire collection. We consider the model of user-level local differential privacy, which is relatively unexplored. Indeed, even for a problem as fundamental as univariate mean estimation, prior to this work the minimax rate of estimation was undetermined. We aim to fill this gap, obtaining minimax optimal estimation rates for a range of canonical statistical estimation problems including univariate and multidimensional mean estimation, sparse mean estimation, and non-parametric density estimation. We first derive a general minimax lower bound, which shows that the risk cannot, in general, be made to vanish for a fixed number of users even when T is arbitrarily large. We then derive matching, up to logarithmic factors, lower and upper bounds for the aforementioned canonical problems. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as T, the number of observations each user holds, varies.

D1 - Matthew Adeoye

Title: Bayesian Copula-Based Modelling For Multi-Type Spatio-Temporal Outbreak Data
Abstract: The study of infectious disease outbreaks from multi-type disease pathogens often requires modelling techniques that account for the complex interactions existing between strains of the pathogen across geographical locations and time. In this talk, I will introduce a novel multi-type spatio-temporal model to better support the understanding of these pathogens. I will show a computationally efficient MCMC sampling scheme for the proposed models and some simulation/real-world results.

D2 - Federico Perlino

Title: A Bayesian Parametric and Nonparametric Approach for the Imputation of Multivariate Left-Censored Data Due to Limit of Detection
Abstract: Left-censored observations due to limits of detection and/or quantification are common in clinical and epidemiologic research when continuous predictors are assessed from human specimens. In these settings, values below a certain threshold are not detectable in laboratory analysis and are reported as missing in the dataset. Classical imputation approaches have mostly relied on imputing the same number for all non-detected samples, thus compromising the continuous nature of the censored variables and affecting their variability and potential inclusion in regression modeling. Continuous imputations have been presented, but generally focusing on a single variable at the time. It is common, moreover, for the same human specimen to be used for the quantification of several biomarkers or exposures simultaneously, thus resulting in a complex set of multivariate and possibly correlated left-censored observations. To the best of our knowledge, there is no established framework that flexibly accounts for the real-world complexity of these data. We propose a Bayesian multiple imputation (MI) approach that relies on the introduction of multivariate latent variables to handle multivariate left-censored data. We present a general framework, accommodating both a parametric approach, assuming multivariate normality of the data, and a nonparametric approach, modeling observations by means of a location Dirichlet process mixture of multivariate normal kernels. Both approaches are implemented through a Gibbs sampling scheme. The performances of our approach are investigated with a simulation study based on environmental exposures, and illustrated by analyzing a real dataset on cardiovascular biomarkers

D3 - Jia Le Tan

Title: Approximate Bayesian Inference for Ecological Dynamics, with Applications to Fisheries
Abstract: Ecological systems are often described by complex models that capture nonlinear dynamics, stochasticity, and partial observability, with fisheries providing a key motivating application. In many cases, these models lead to likelihoods that are unavailable or too expensive to compute, making exact Bayesian inference impractical. Moreover, because these models necessarily simplify complex real-world processes, they are often susceptible to model misspecification, creating further challenges for reliable inference and uncertainty quantification. This talk presents ongoing work on approximate Bayesian inference methods for such settings, spanning classical approaches such as approximate Bayesian computation, synthetic likelihood, and their sequential Monte Carlo variants, as well as more recent simulation-based methods including neural posterior and neural likelihood estimation. While these methods offer a flexible framework for uncertainty-aware inference when conventional likelihood-based methods are not viable, their application in ecological and fisheries settings also presents important practical challenges, which I will discuss in this talk.

Posters - Abstracts

Hugo Queniat - Locally Informed Temperature Swaps
Gregor Steiner - Possibilistic Instrumental Variable Regression with Potentially Invalid Instruments
James Wheeldon - European Options in Market Models with Multiple Defaults: the BSDE Approach
Ohood Aldalbahi - The randomly distorted Choquet integrals with respect to a G-randomly distorted capacity and risk measures
Benedict Risebrow - Semi-supervised linear regression with missing covariates
Peter Matthews - Bayesian Inverse Problems in Proton Therapy
Joe Taggart - Predicting Patient Reported Outcome Measures (PROMs) Using Routinely Collected Healthcare Data
Edwin Tang - Online change point detection under heavy-tailed and contaminated data

ECR Session - Abstract

An interactive section for Early Career Researchers and PhD Students is being run by Charles Martinez (G-Research).

Abstract: We are a leading quantitative research and technology company based in London. Day to day we use a variety of quantitative techniques to predict financial markets from large data sets worldwide. Mathematics, statistics, machine learning, natural language processing and deep learning is what our business is built on. Our culture is academic and highly intellectual. In this seminar I will explain our background, current AI research applications to finance and our ongoing outreach, recruitment and grants programme.

The seminar will be aimed at those who are curious about quant finance or interested in internship opportunities.

We will also play an interactive game. The game will last around 40 minutes and there will be prizes for the Top team (Amazon vouchers - £100).
All game equipment will be provided.