Algorithms & Computationally Intensive Inference seminars

The seminars will take place on Fridays 1 pm UK time in room MB0.08.

2024-2025 Organisers: Wenkai Xu & Filippo Pagani

If you would like to speak, or you want to be included in any emails, please contact one of the organisers.

Website URL: www.warwick.ac.uk/compstat

Mailing List Sign-Up: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB - only approved members can post)

Term 3:

Date	Speaker	Title
27/06
Note: Hybrid	Abstract:
20/06	Tim Sullivan (Warwick)	Title
Note: Hybrid	Abstract:
13/06	Kamelia Daudel (ESSEC)	Title
Note: Hybrid	Abstract:
06/06	Lorenzo Rimella (Torino)	Title
Note: Hybrid	Abstract:
30/05	Sanket Agrawal (Warwick)	Title
Note: Hybrid	Abstract:
23/05	No Seminar due to CRiSM Workshop

16/05	Jakiw Pidstrigach (Oxford)	Title
Note: Hybrid	Abstract:
09/05	Zhengang Zhong (Warwick)	Title
Note: Hybrid	Abstract:
02/05	Axel Finke (Loughborough)	Title
Note: Hybrid	Abstract:
25/04	Sam Power (Bristol)	Title
Note: Hybrid	Abstract:

Term 2:

Date	Speaker	Title
14/03	Zheng Zhao (Linkoeping University)	Conditional sampling within generative diffusion models
Note: Hybrid	Abstract: Conditional sampling is at the core in computational statistics, particularly for solving Bayesian inverse problems. In this talk, we present a generalised view of how recent developments in (generative) diffusion models can facilitate sampling conditional distributions p(x \| y), achieving state-of-the-art performance for high-dimensional and complex distributions (e.g., for images). We discuss two approaches and show how they are related: (1) training a dedicated conditional diffusion model with a sampler of the joint p(x, y), and (2) leveraging a pre-trained diffusion model for the prior p(x) together with an evaluable likelihood p(y ∣ x), using sequential Monte Carlo. We will also present our latest progress on the second approach for generating statistically exact samples.
07/03	Rocco Caprio (Warwick)	Maximum marginal likelihood, EM, Gradient flows and a log-Sobolev inequality
Note: Hybrid	Abstract: Maximum marginal likelihood estimation and Empirical Bayes are fundamental procedures in statistics and machine learning. They arise, for instance, in parameter inference problems within state-space models, probabilistic principal component analysis, missing data problems, and more. First, we will discuss various algorithms for tackling this estimation problem, which implement various optimization strategies on the free energy functional (i.e. minus ELBO). These include, among others, the EM algorithm and some gradient flows methods recently introduced in the literature. Then, we will introduce a fundamental functional inequality that characterizes the fast convergence of all these algorithms. We will see how this inequality generalizes upon the log-Sobolev and Polyak-Łojasiewicz inequalities, establishing connections with various concepts and results in optimal transport.
28/02	Paula Cordero Encinar (Imperial)	Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling
Note: Hybrid	Abstract: We investigate the theoretical properties of general diffusion (interpolation) paths and their Langevin Monte Carlo implementation, referred to as diffusion annealed Langevin Monte Carlo (DALMC), under weak conditions on the data distribution. Specifically, we analyse and provide non-asymptotic error bounds for the annealed Langevin dynamics where the path of distributions is defined as Gaussian convolutions of the data distribution as in diffusion models. We then extend our results to recently proposed heavy-tailed (Student's t) diffusion paths, demonstrating their theoretical properties for heavy-tailed data distributions for the first time. Our analysis provides theoretical guarantees for a class of score-based generative models that interpolate between a simple distribution (Gaussian or Student's t) and the data distribution in finite time. This approach offers a broader perspective compared to standard score-based diffusion approaches, which are typically based on a forward Ornstein-Uhlenbeck noising process.
21/02	Jeremias Knoblauch (UCL)	Post-Bayesian Machine Learning
Note: Hybrid	Abstract: In this talk, I provide my perspective on the machine learning community's efforts to develop inference procedures with Bayesian characteristics that go beyond Bayes' Rule as an epistemological principle. I will explain why these efforts are needed, as well as the forms which they take. Focusing on some of my own contributions to the field, I will trace out some of the community's most important milestones, as well as the challenges that lie ahead. Throughout, I will provide success stories of the field, and emphasise the new opportunities that open themselves up to us once we dare to go beyond orthodox Bayesian procedures.
14/02	Wenkai Xu (Warwick)	Split Conformal Prediction under Data Contamination
Note: Hybrid	Abstract: Conformal prediction constructs prediction intervals or sets for arbitrary predictive models. One of the key assumptions for conformal prediction is exchangeability of the data. Beyond exchangeability, we consider the Huber contamination scenario in this talk and discuss the coverage property of split conformal prediction. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets. We introduce Contamination Robust Conformal Prediction (CRCP), that is an adjusted version of split conformal technique with improved coverage, particularly for the classification cases. We show the efficacy of our approach using both synthetic and real datasets. This is a joint work with Jason Clarkson, Mihai Cucuringu and Gesine Reinert https://arxiv.org/pdf/2407.07700
07/02	Brett Kolesnik (Warwick)	Random walks in Coxeter geometry
Note: Hybrid	Abstract: The permutahedron is a classical object in discrete geometry, obtained as the convex hull of (1,2,…,n) and its permutations. It is a fundamental example of zonotope, being the affine projection of a higher dimensional unit cube. Brualdi and Li defined interchange graphs, corresponding to the fibers of lattice points in the permutahedron, stating that they have a “rich and fascinating combinatorial structure and that much remains to be determined.’’ Moreover, they say that even counting the number of vertices in such graphs is of “considerable interest and considerable difficulty.” As a first step towards approximate counting, McShine showed that simple random walks rapidly mix on any given interchange graph. In this talk, we will discuss joint work with PhD students at Oxford, Matthew Buckland, Rivka Mitchell and Tomasz Przybyłowski, that extends this result to the significantly more intricate Coxeter permutahedra, recently introduced by Ardila, Castillo, Eur and Postnikov.
31/01	Arnab Bhattacharya (Warwick)	Distribution Learning Meets Graph Structure Sampling
Note: Hybrid	Abstract: This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton. Joint work with Sutanu Gayen (IIT Kanpur), Philips George John (NUS), Sayantan Sen (NUS), and N.V. Vinodchandran (U Nebraska Lincoln)
24/01	Fan Wang (Warwick)	TBD
Note: Hybrid	Abstract: TBD
17/01	Shenggang Hu (Warwick)	Privacy Guarantees in Posterior Sampling under Contamination
Note: Hybrid	Abstract: In recent years differential privacy has been adopted by tech companies and governmental agencies as the standard for measuring privacy in algorithms. In this article, we study differential privacy in Bayesian posterior sampling settings. We consider adopting Huber's contamination model for use within privacy settings instead of directly injecting noise into the output and demonstrate how this approach removes the restriction of requiring a bounded observation and parameter space which is commonly assumed in the existing literature. Asymptotically, our contamination approach is fully private at no cost of information loss.
10/01	Desi Ivanova (Oxford)	Modern Bayesian Experimental Design
Note: Hybrid	Abstract: Bayesian experimental design (BED) provides a powerful and principled information-theoretic framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this talk, we will outline how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively. We will discuss techniques aimed at making BED more practical for real-world applications, including methods for enabling online deployment and enhancing robustness to mis-specifications of the underlying Bayesian model. (https://arxiv.org/pdf/2302.14545)

Term 1:

Date	Speaker	Title
06/12	Connie Trojan (Lancaster)	Diffusion Generative Modelling for Divide-and-Conquer MCMC
Note: Hybrid	Abstract: Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations to the subposterior distributions. This approach outperforms existing methods on challenging merging problems, while its computational cost scales more efficiently to high dimensional problems than existing density estimation approaches.
29/11	Lukas Trottner (Birmingham)	Learning to reflect – On data driven approaches to stochastic optimal control
Note: Hybrid	Abstract: Even though theoretical solutions to stochastic optimal control problems are well understood in many scenarios, their practicability suffers from the assumption of known dynamics of the underlying stochastic process. This raises the statistical challenge of developing purely data-driven strategies. In this talk we focus on singular control problems for diffusions and demonstrate how such data-driven strategies with explicit sublinear regret bounds can be constructed by employing nonparametric statistical techniques. The talk is based on joint works with Sören Christensen, Asbjørn Holk Thomsen and Claudia Strauch.
22/11	Anna Shalova (Eindhoven)	Singular-limit analysis of gradient descent with noise injection
Note: Hybrid	Abstract: We study the limiting dynamics of noisy gradient descent systems in the overparameterized regime. In this regime the set of global minimizers of the loss is large, and when initialized in a neighbourhood of this zero-loss set a noisy gradient descent algorithm slowly evolves along this set. In some cases this slow evolution has been related to better generalisation properties. We give an explicit characterization of this evolution for the broad class of noisy gradient descent systems. Our results show that the structure of the noise affects not just the form of the limiting process, but also the time scale at which the evolution takes place. We apply our theory to Dropout, label noise and classical SGD (minibatching) noise. We show that dropout and label noise models evolve on two different time scales. At the same time classical SGD yields a trivial evolution on both mentioned time scales, implying that an additional noise is required for regularization. This is a join work with Mark Peletier and André Schlichting.
15/11	Saifuddin Syed (Oxford)	Scalable sampling using annealed algorithms
Note: Hybrid	Abstract: Generating samples from complex probability distributions is a fundamental challenge in statistical modelling and Bayesian statistics. In practice, this is generally impossible, and we must introduce a simpler reference distribution, such as a Gaussian, and manipulate its density and samples to approximate the target. In general, direct inference is reliable when the reference is close to the target and fragile when it is not. Annealing is a popular technique motivated by this principle and introduces a sequence of distributions that interpolates between the reference and target, ensuring the neighbouring distributions are close enough. An annealing algorithm specifies how to traverse this bridge of distributions to incrementally transform samples from the reference into samples approximating the target. In this talk, we will construct two computationally dual annealing algorithms called Sequential Monte Carlo Samplers (SMC) and Parallel Tempering (PT), which propagate samples from the reference to the target using importance sampling and Metropolis-Hasting, respectively. By analysing the variance of the normalising constant estimator, we will see how the performance scales with increasing runtime, parallelism, memory, and the difficulty of the inference problem. Notable, we will identify a critical phenomenon and explain why these algorithms are efficient and can scale to tackle modern sampling problems. Finally, we will provide a black-box algorithm to tune these algorithms efficiently and practical guidelines for when to implement SMC versus PT.
8/11	Antonin Schrab (UCL)	Optimal Kernel Hypothesis Testing
Note: Hybrid	Abstract: The thesis consists in proposing new kernel hypothesis tests and proving optimal power guarantees for them. Various testing frameworks such as the two-sample, independence and goodness-of-fit frameworks are considered. A strong focus is put on the, often ignored, crucial choice of the kernel which strongly impacts the test power. Two methods, namely kernel pooling and aggregation, are proposed to adaptively select the kernels in a parameter-free manner, and are shown to lead to minimax optimal separation rates with respect to the kernel and L2 metrics. Optimal kernel tests are also developed, and their power guarantees theoretically analysed, under various testing constraints such as, computation efficiency, differential privacy, and robustness to data corruption.
1/11	Alice Corbella (Warwick)	The Lifebelt Particle Filter: a novel robust SMC scheme
Note: Hybrid	Abstract: Sequential Monte Carlo (SMC) methods can be applied to discrete State-Space Models on bounded domains, to sample from and marginalise over unknown random variables. Similarly to continuous settings, problems such as particle degradation can arise: proposed particles can be incompatible with the data, lying in low probability regions or outside the boundary constraints, and the discrete system could result in all particles having weights of zero. In this talk I will introduce the Lifebelt Particle Filter (LBPF), a novel SMC method for robust likelihood estimation in low-valued count problems. The LBPF combines a standard particle filter with one (or more) lifebelt particles which, by construction, lie within the boundaries of the discrete random variables, and therefore are compatible with the data. The main benefit of the LBPF is that only one or few, wisely chosen, particles are sufficient to prevent particle collapse. The LBPF can be used within a pseudo-marginal scheme to draw inference on static parameters, θ, governing the system. In the talk I will also present an example of the use of the LBPF for the estimation of the parameters governing the death and recovery process of hospitalised patients during an epidemic. Ref: Corbella, A., McKinley, T.J., Birrell, P.J., Presanis, A.M., Roberts, G.O. and De Angelis, D., and Spencer S.E. 2022 The Lifebelt Particle Filter for robust estimation from low-valued count data. arXiv preprint arXiv:2212.04400.
25/10	Heishiro Kanagawa (Newcastle)	Reinforcement Learning for Adaptive MCMC
Note: Hybrid	Abstract: An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this work is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on ≈90% of tasks in the PosteriorDB benchmark.
18/10	Anastasia Mantziou (Warwick)	Bayesian model-based clustering for populations of network data
Note: Hybrid	Abstract: There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate back to the applied problem of interest. Motivated by two complementary applied examples, we develop a Bayesian framework to appropriately model complex heterogeneous network populations, while also allowing analysts to gain insights from the data and make inferences most relevant to their needs. The first application involves a study in computer science measuring human movements across a university. The second analyses data from neuroscience investigating relationships between different regions of the brain. While both applications entail analysis of a heterogeneous population of networks, network sizes vary considerably. We focus on the problem of clustering the elements of a network population, where each cluster is characterised by a network representative. We take advantage of the Bayesian machinery to simultaneously infer the cluster membership, the representatives, and the community structure of the representatives, thus allowing intuitive inferences to be made. The implementation of our method on the human movement study reveals interesting movement patterns of individuals in clusters, readily characterised by their network representative. For the brain networks application, our model reveals a cluster of individuals with different network properties of particular interest in neuroscience. The performance of our method is additionally validated in extensive simulation studies.
11/10	Luke Hardcastle (UCL)	Piecewise Deterministic Markov Processes for transdimensional sampling from flexible Bayesian survival models
Note: Hybrid Slides	Abstract: Flexible survival models have seen increasing popularity for the estimation of mean survival in the presence of a high degree of administrative censoring where survival curves need to be extrapolated beyond final observed event times. This increased flexibility, however, often introduces challenging model selection problems that have limited their wider application. In this talk I will focus on two such models, the polyhazard model and the piecewise exponential model. We introduce new prior structures that allow for the joint inference of parameters and structural quantities. Posterior sampling is achieved using bespoke MCMC schemes based on Piecewise Deterministic Markov Processes that utilise and extend existing methods for these samplers to target transdimensional posterior distributions. This is a joint work with Samuel Livingstone and Gianluca Baio.