2024-2025
The seminars will take place on Fridays 1 pm UK time in room MB0.08.
2023-2024 Organisers: Andi Wang & Shenggang Hu
If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Website URL: www.warwick.ac.uk/compstat
Mailing List Sign-Up: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB - only approved members can post)
Term 3:
Date | Speaker | Title |
12/04 | Mareike Hasenpflug (Passau) | Geodesic slice sampling on Riemannian manifolds |
Note: Extra session Hybrid |
Riemannian manifolds allow to incorporate more information (e.g. on dependencies between variables) into mathematical models, but require efficient sampling techniques on manifolds when such models are tackled with Bayesian inference. We propose a slice sampling based MCMC-method to approximately sample distributions on Riemannian manifolds called geodesic slice sampling. Its transition mechanism consists of running a 1-dimensional slice sampler on a randomly chosen geodesic. We affirm correctness of geodesic slice sampling by showing its reversibility with respect to the target distribution and demonstrate its performance through numerical experiments. This talk is based on joint work with Alain Durmus, Samuel Gruffaz, Michael Habeck, Shantanu Kodgriwar and Daniel Rudolf. |
|
26/04 | Ardjen Pengel | Gaussian Approximation and Output Analysis for High-Dimensional MCMC |
Note: Hybrid |
The widespread use of Markov Chain Monte Carlo (MCMC) methods for high-dimensional applications has motivated research into the scalability of these algorithms with respect to the dimension of the problem. Despite this, numerous problems concerning output analysis
in high-dimensional settings have remained unaddressed. We present explicit dimension-dependent Gaussian approximation results for a broad range of both continuous and discrete time MCMC algorithms. Notably, we obtain novel results for both exponentially and polynomial ergodic MCMC algorithms and we analyse the dependency of the obtained bounds on the dimension of both the target distribution and the feature space. We demonstrate how these Gaussian approximation results can be applied in output analysis. We give quantitative convergence bounds for termination criteria and show that the termination time of a wide class of MCMC algorithms scales polynomially in dimension while ensuring a desired level of precision. Our results offer guidance to practitioners for obtaining appropriate standard errors and deciding the minimum simulation effort of MCMC algorithms in both multivariate and high-dimensional settings. |
|
03/05 | Neil ChadaLink opens in a new window (Heriot Watt) | Unbiased Langevin Monte Carlo |
Note: In-person |
Markov chain Monte Carlo (MCMC) is a powerful and well-known class of methods aimed to sample from
probability distributions. An issue that can arise with MCMC is that there is an induced bias associated with
the burn-in period, related to the prior distribution. Unbiased estimators have recently been designed based
on coupling ideas, however these methods still suffer from numerous issues which still occur within MCMC.
This talk will be focused on the development of unbiased estimators which take motivation from Langevin dynamics.
This approach is more simplistic in nature, and avoids using the Metropolization step, which takes motivation
from the work of Rhee and Glynn. We develop two new unbiased estimators, which we compare to well-known
methods on interesting model problems, such as an MNIST regression problem, Poisson soccer model. Both theory
and numerical simulations demonstrate the efficiency and performance of our methods.
|
|
07/05 | Federica MilinanniLink opens in a new window (KTH) | Large deviation principle for Metropolis-Hastings sampling |
Note: Extra session MB0.07 2-3PM Hybrid |
Good performance measures for the convergence of Markov chain Monte Carlo (MCMC) methods are essential. For instance, they can be used to compare different algorithms, or to tune parameters within a given method. Common tools that are used for analysing convergence properties of MCMC algorithms are, e.g., mixing times, spectral gap and functional inequalities (e.g. Poincaré, log-Sobolev). A further, rather novel, approach consists in the use of large deviations theory to study the convergence of empirical measures of MCMC chains. At the heart of large deviations theory is the large deviation principle, which allows us to describe the rate of convergence of the empirical measures through a so-called rate function.
In this talk we will consider Markov chains generated via MCMC methods of Metropolis-Hastings type for sampling from a target distribution on a Polish space. We will state a large deviation principle for the corresponding empirical measure, defining a rate function in terms of relative entropy. We will present an alternative variational representation of the rate function, generalizing classical results by Donsker and Varadhan. This alternative formulation is (potentially) more suitable for analysing the associated MCMC methods and allows us to find a lower bound of the rate function. We will show examples of algorithms from the Metropolis-Hastings class (Independent Metropolis-Hastings, MALA) for which the theorem applies, and illustrate how the results can be used to tune algorithm parameters.
|
|
10/05 | Alicia GillLink opens in a new window (Warwick) |
Bayesian Inference of the Reproduction Number from Epidemic and Genomic Data
|
Note: Hybrid |
Estimating the reproduction number R(t), representing the average number of new infections caused by an infected individual at time t, is of vital importance during an epidemic outbreak. Typically R(t) is inferred using only epidemic data, such as hospital case counts or number of positive tests. However, epidemic data is often noisy, partially observed and/or biased. There is now an abundance of genomic data available, so genomic data is increasingly being used to understand infectious disease epidemiology. We have implemented a particle marginal Metropolis—Hastings algorithm to infer the trajectory of R(t) given a dated phylogeny (representing the genomic data) and partially observed prevalence data. The performance of this approach is analysed using simulated data and we present a case study for real data. We will also discuss extensions of this method which allow us to relax the assumption of a known dated phylogeny.
|
|
17/05 | Yorgos FelekisLink opens in a new window (Wariwck) | Causal Optimal Transport of Abstractions |
Note: Hybrid |
Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments, learning causally consistent representations at different resolutions, and linking interventions across multiple SCMs. In this talk, we will present COTA, the first method to learn abstraction maps from observational and interventional data without assuming complete knowledge of the underlying SCMs. In particular, we introduce a multi-marginal Optimal Transport (OT) formulation that enforces do-calculus causal constraints, together with a cost function that relies on interventional information. We extensively evaluate COTA on synthetic and real world problems, and showcase its advantages over non-causal, independent and aggregated OT formulations. Finally, we demonstrate the efficiency of our method as a data augmentation tool by comparing it against the state-of-the-art CA learning framework, which assumes fully specified SCMs, on a real-world downstream task.
|
|
24/05 | Keaton HammLink opens in a new window (UT Arlington) | Getting more with less: matrix and tensor decompositions by mode subsampling |
Note: In-person |
We will discuss variants of matrix and tensor CUR decompositions and algorithms for Robust PCA that allow one to observe only submatrices or subtensors of the data. By subsampling modes, we can obtain algorithms with state-of-the-art runtime for these tasks. Sample applications to image and video processing will be discussed.
|
|
31/05 | Paolo Ceriani (Bocconi) | Linear-cost unbiased posterior estimates for crossed effects and matrix factorization models |
Note: Hybrid |
In recent years, unbiased MCMC via couplings (UMCMC) emerged as a promising framework for obtaining estimates from Markov Chains without the need for extensive convergence diagnostics and with the potential of easy parallelization. A natural question that arises is whether this class of estimates incurs in a greater computational cost than the conventional ergodic average and quantify this potential difference. In this talk, I will illustrate a methodology for unbiased estimation of Bayesian models whose computational cost is provably scalable in the number of data points under modern asymptotic regimes, building upon the coupling estimators introduced by Jacob et al. (2020), and matching performance of some state of the art MCMCs. Specifically, I will provide some bounds on the expected number of iterations needed for the chains to meet, directly linked to the relaxation time of the marginal chains. Time allowing, I will show simulations for Gaussian and non Gaussian crossed random effect models and probabilistic matrix factorization models.
|
|
07/06 | Joshua BonLink opens in a new window (Dauphine) | Bayesian scoring rule calibration for approximate models |
Note: Hybrid |
Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations, it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with the surrogate, but this can result in bias and poor uncertainty quantification. In this paper we propose a new method for adjusting approximate posterior samples to reduce bias and produce more accurate uncertainty quantification. We do this by optimizing a transform of the approximate posterior that maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We demonstrate good performance of the new method on several examples of increasing complexity. In this talk I will present new results for inferring parameters of reaction networks (Markov jump processes) using an extended Kalman filter approach to approximating the likelihood.
This is joint work with David Warne, David Nott, and Chris Drovandi. |
|
14/06 | Adrien Corenflos (Warwick) | Particle-MALA and Particle-mGrad: Gradient-based MCMC methods for high-dimensional state-space models |
Note: Hybrid |
State-of-the-art methods for Bayesian inference in state-space models are (a) conditional sequential Monte Carlo (CSMC) algorithms; (b) sophisticated 'classical' MCMC algorithms like MALA, or mGRAD from Titsias and Papaspiliopoulos (2018). The former propose N particles at each time step to exploit the model's 'decorrelation-over-time' property and thus scale favourably with the time horizon, T, but break down if the dimension of the latent states, D, is large. The latter leverage gradient/prior-informed local proposals to scale favourably with D but exhibit sub-optimal scalability with T due to a lack of model-structure exploitation. We introduce methods which combine the strengths of both approaches. The first, Particle-MALA, spreads N particles locally around the current state using gradient information, thus extending MALA to T>1 time steps and N>1 proposals. The second, Particle-mGRAD, additionally incorporates (conditionally) Gaussian prior dynamics into the proposal, thus extending the mGRAD algorithm. We prove that Particle-mGRAD interpolates between CSMC and Particle-MALA, resolving the 'tuning problem' of choosing between CSMC (superior for highly informative prior dynamics) and Particle-MALA (superior for weakly informative prior dynamics). We similarly extend other 'classical' MCMC approaches like auxiliary MALA, aGRAD, and preconditioned Crank-Nicolson-Langevin (PCNL). In experiments, our methods substantially improve upon both CSMC and sophisticated `classical' MCMC approaches for both highly and weakly informative prior dynamics.
TL;DR: We aim to solve the curse of dimensionality in state-space model inference by combining the nice scaling (in time) of conditional particle filtering methods, with the nice scaling (in space) of MALA and related gradient-based algorithms.
|
|
21/06 | Matthias SachsLink opens in a new window (Birmingham) | Posterior Computation with the Gibbs Zig-Zag Sampler |
Note: Hybrid |
In recent years, sampling methods based on piecewise deterministic Markov processes (PDMPs) have received increasing interest as an alternative to Markov chain Monte Carlo (MCMC). We propose a new variant of PDMPs termed Gibbs zig-zag samplers, which allow parameters to be updated in blocks with a zig-zag sampler applied to certain parameters and traditional MCMC-style updates to others. We demonstrate the flexibility of this framework on posterior sampling for logistic models with shrinkage priors for high-dimensional regression and random effects and provide conditions for geometric ergodicity and the validity of a central limit theorem.
|
|
28/06 | Ritabrata DuttaLink opens in a new window (Warwick) / David HukLink opens in a new window | Quasi-Bayes meets Vines |
Note: Hybrid |
Recently proposed quasi-Bayesian (QB) methods initiated a new era in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, but extensions to multiple dimensions rely on a conditional decomposition resulting from predefined assumptions on the kernel of the Dirichlet Process Mixture Model, which is the implicit nonparametric model used. Here, we propose a different way to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. Thus, we use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg. energy score) and show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with \emph{an analytical form} and convergence rate independent of the dimension of data in some situations. Our experiments illustrate that the QB-Vine is appropriate for high dimensional distributions (∼64), needs very few samples to train (∼200) and outperforms state-of-the-art methods with analytical forms for density estimation and supervised tasks by a considerable margin.
|
|
Term 2:
Date | Speaker | Title |
12/01 | Shenggang Hu | Statistical Disaggregation — a Monte Carlo Approach for Imputation under Constraints |
Note: Hybrid |
Statistical disaggregation has become more and more important for smart energy systems. A typical example of such disaggregation problems is to learn energy consumption for a higher resolution level based on data at a lower resolution, e.g., to decompose a daily reading into readings at a finer level. Constrained models are often used in such problems and they are often very useful compared to their unconstrained counterparts in terms of reducing uncertainty and leading to an improvement in overall performance. However, these constrained models usually are not expressible as ordinary distributions due to their intractable density functions which makes it hard to conduct further analysis. We introduce a novel Monte Carlo sampling algorithm based on Langevin diffusions and rejection sampling to solve the problem of sampling from equality-constrained models. We test our method on statistical disaggregation problems for electricity consumption datasets, and our approach provides better uncertainty estimation and accuracy in data imputation compared with other naive/unconstrained methods.
|
|
19/01 | Andrew DuncanLink opens in a new window (Imperial) | Energy Discrepancies: Fitting Unnormalised Probability Models without Scores |
Note: Hybrid |
The problem of fitting unnormalised densities to data is a common challenge across statistics (e.g. likelihood free inference) and machine learning (e.g. Energy Based Models). Solutions to this either involve score-matching or its variants, which require computation of score functions and its derivatives; or alternatively require expensive Markov Chain Monte Carlo Simulations. I present a very simple alternative approach called Energy Discrepancy (ED), which does not require either. I show that, as a divergence, ED interpolates between score matching and KL divergence. As a result of this, minimum ED estimation overcomes the problem of “near-sightedness” encountered in score-based estimation methods, while also enjoying theoretical guarantees. This score-free approach also opens the door to fitting unnormalized models on discrete spaces. I will demonstrate how to implement the method in practice, and its application to a few problems. | |
26/01 | Matthew ThorpeLink opens in a new window (Warwick) | Discrete-To-Continuum Limits in Graph-Based Semi-Supervised Learning |
Note: In-person |
Semi-supervised learning (SSL) is the problem of finding missing labels from a partially labelled data set. The heuristic one uses is that “similar feature vectors should have similar labels”. The notion of similarity between feature vectors explored in this talk comes from a graph-based geometry where an edge is placed between feature vectors that are closer than some connectivity radius. A natural variational solution to the SSL is to minimise a Dirichlet energy built from the graph topology. And a natural question is to ask what happens as the number of feature vectors goes to infinity? In this talk I will give results on the asymptotics of graph-based SSL using an optimal transport topology. The results will include a lower bound on the number of labels needed for consistency. | |
30/01 |
Rocco CaprioLink opens in a new window (Warwick) Andi WangLink opens in a new window (Warwick) |
1. A calculus for Markov chain Monte Carlo: studying approximations in algorithms 2. Comparison theorems for Hybrid Slice Sampling |
Note: In-person |
1. Markov chain Monte Carlo (MCMC) algorithms are based on the construction of a Markov Chain with transition probabilities $P_\mu(x,\cdot)$, where $\mu$ indicates an invariant distribution of interest. In this work, we look at these transition probabilities as functions of their invariant distributions, and we develop a notion of derivative in the invariant distribution of a MCMC kernel. We build around this concept a set of tools that we refer to as Markov Chain Monte Carlo Calculus. This allows us to compare Markov chains with different invariant distributions within a suitable class via what we refer to as mean value inequalities. We explain how MCMC Calculus provides a natural framework to study algorithms using an approximation of an invariant distribution, also illustrating how it suggests practical guidelines for MCMC algorithms efficiency. We conclude this work by showing how the tools developed can be applied to prove convergence of interacting and sequential MCMC algorithms, which arise in the context of particle filtering. 2. Hybrid Slice Sampling is a variant of Simple Slice Sampling, whereby exact sampling from the uniform distribution on a level set is replaced with sampling from a Markov kernel which is merely invariant with respect to this distribution. It is known that the spectral gap of the Hybrid Slice Sampling algorithm, if it exists, is smaller than that of the Simple Slice Sampling algorithm. Following the framework first presented in Andrieu, Lee, Power, Wang (2022), we shall derive new comparison results, giving convergence guarantees for Hybrid Slice Sampling, relative to convergence rates of Simple Slice Sampling. These comparisons are valid even in subgeometric scenarios when the spectral gap is 0, and significantly extend and simplify previous results in the literature. |
|
02/02 | Connor DuffinLink opens in a new window (Cambridge) | Time-Dependent Statistical Finite Element Problems: Unstructured Data, Unknown Observation Operators, and Inversion |
Note: In-person |
The so-called ``statistical finite element method'' (statFEM) is a novel approach to conduct statistically coherent data assimilation for PDEs. A spatially smooth additive noise is included in the PDE, yielding a prior distribution which can be used to assimilate data in an online fashion, the well-known filtering problem. Results show that the method performs well and deals with model misspecification across many contexts (such as internal waves, reaction-diffusion systems, and shallow water flows). However, an obvious problem with the method is its reliance on the observation operator being known. The observation operator is the mapping from the model to the data and, in many scenarios, can be an unknown and highly nontrivial object. In this talk, I will present a statFEM-based approach to conducting data assimilation with unknown observation operators, motivated through solving the inverse problem for a collection of nonlinear differential equations. We will motivate our work through the Korteweg-de Vries equation, and discuss further case studies with the Lorenz-63 system of ODEs, and the Wave equation.
This is joint work, with Alex Glyn-Davies (University of Cambridge), Dr. Deniz Akyildiz (Imperial College London), and Prof. Mark Girolami (University of Cambridge & Alan Turing Institute).
|
|
09/02 | Guglielmo GattiglioLink opens in a new window (Warwick) |
Nearest Neighbor GParareal: Improving Scalability of Gaussian Processes for Parallel-in-Time Solvers
|
Note: Hybrid |
With the advent of supercomputers, multi-processor environments and parallel-in-time (PiT) algorithms provide ways to integrate ordinary differential equations (ODEs) over long time intervals, a task often unfeasible with sequential time-stepping solvers within realistic timeframes. A recent approach, GParareal, combines machine learning (Gaussian Processes) with traditional PiT methodology (Parareal) to achieve faster parallel speed-ups. Unfortunately, the applicability of the model is limited to a small number of computer cores and ODE dimensions. We present Nearest Neighbor GParareal (NN-GParareal), a data-enriched parallel-in-time integration algorithm that builds upon GParareal by improving its scalability properties for higher dimensional systems and increased processor count. Through data reduction, the model complexity is reduced from cubic in the sample size to loglinear, yielding a fast, automated procedure to integrate initial value problems over long intervals. The practical utility of NN-GParareal is demonstrated theoretically and empirically through its evaluation on nine different systems. Our analysis offers tangible evidence of NN-GParareal's behavior, advantages, and validity.
|
|
16/02 | Giorgos VasdekisLink opens in a new window (UCL) | Skew-symmetric sampling schemes for SDEs and where to find them |
Note: Hybrid |
Locally balancing algorithms are a new class of MCMC algorithms, recently introduced in (Livingstone and Zanella, 2022). One of these algorithms, the Barker algorithm, has been shown to be robust to heteroskedasticity of the posterior target and the step size of the algorithm. At the same time, the algorithm seems to preserve high dimensional properties of state of the art MCMC, making it an interesting alternative to the existing literature. It turns out that in order to sample from the Barker algorithm, one can use ideas of sampling from skew-symmetric distributions. We will transfer these ideas in the context of simulating from diffusion processes and we will suggest a new class of unadjusted MCMC algorithms, which are robust with respect to the step size.
This is joint work with S. Livingstone, N. Nusken and R. Zhang.
|
|
23/02 | Pierre NyquistLink opens in a new window (Chalmers University of Technology, Gothenburg University) | Large deviations for Markov chain Monte Carlo methods: the surprisingly curious case of Metropolis-Hastings |
Note: Hybrid |
Markov chain Monte Carlo (MCMC) methods have become the workhorse for numerical computations in a range of scientific disciplines, e.g., computational chemistry and physics, statistics, and machine learning. The performance of MCMC methods has therefore become an important topic at the intersection of probability theory and (computational) statistics: e.g., when the underlying distribution one is trying to sample from becomes sufficiently complex, convergence speed and/or the cost per iteration becomes an issue for most MCMC methods.
The analysis, and subsequently design, of MCMC methods has to a large degree relied on classical tools used to determine the speed of convergence of Markov chains, e.g., mixing times, spectral gap and functional inequalities (Poincaré, log-Sobolev). An alternative avenue is to use the theory of large deviations for empirical measures. In this talk I will first give a general outline of this approach to analysing MCMC methods, along with some recent examples. I will then consider the specific case of the Metropolis-Hastings algorithm, the most classical amongst all MCMC methods and a foundational building block for many more advanced methods. Despite the simplicity of this method, it turns out that the theoretical analysis of it is still a rich area, and from the large deviation perspective it is surprisingly difficult to treat. As a first step we show a large deviation principle for the underlying Markov chain, extending the celebrated Donsker-Varadhan theory. Time permitted I will also discuss ongoing and future work on using this result for better understanding of both the Metropolis-Hastings method and more advanced methods, such as approximate Bayesian computation (ABC-MCMC) and the Metropolis-adjusted Langevin algorithm (MALA).
The talk will be self-contained and no prior knowledge of either MCMC methods or large deviations is required.
The talk is primarily based on join work with Federica Milinanni (KTH).
|
|
01/03 | Evandro KonzenLink opens in a new window (Warwick) | Efficient Bayesian modelling of infectious diseases in wildlife: an application to bovine tuberculosis in badgers |
Note: In-person |
To better understand the dynamics of infectious diseases of wildlife, it is crucial to be able to fit dynamic transmission models to observed data in a robust and efficient way, in order to estimate key epidemiological parameters and generate well calibrated predictive information. In practice, epidemiological events are at best only partially observed, and as such it is necessary to infer missing information alongside the model parameters as part of the inference routine, requiring computationally intensive inference algorithms where computational load increases non-linearly with population size and with increased dimensionality of the hidden states.
With this in mind, we implement a recently proposed individual forward filtering backward sampling algorithm to fit a complex individual-based epidemic model to data from a large-scale longitudinal study of bovine tuberculosis in badgers. This data set, from Woodchester Park in south-west England, comprises >2,000 badgers across 34 social groups over a 40-year period. We deal with many complexities typical to endemic wildlife disease systems: incomplete sampling of individuals over time (through capture-mark-recapture events), the use of multiple diagnostic tests, spatial meta-population structures, and non-Markovian demographic aspects such as age-dependent mortality rates (with censoring), all alongside a hidden stochastic compartmental model of disease spread. The method produces full posterior distributions for the parameters, and predictive distributions for the hidden states over time for each individual, and fits in just a few hours on a desktop machine.
We also propose a novel individual-level reproduction number which accounts for major sources of uncertainty of the disease system, and from it provide quantitative evidence for the presence of superspreader badgers in Woodchester Park. The inference framework is very flexible, and could be applied to other individual-level disease systems, and we will discuss future extensions to explore further important epidemiological questions.
|
|
08/03 | Emma HortonLink opens in a new window (Warwick) | A theoretical analysis of one-dimensional discrete generation ensemble Kalman particle filters |
Note: Hybrid |
Despite the widespread use of discrete generation ensemble Kalman particle filtering methodology to solve nonlinear and high-dimensional filtering and inverse problems, little is known about their mathematical foundations. In this talk, we give a gentle introduction to these models and build on the mathematical analysis of these sophisticated particle filters developed by Le Gland–Monbet–Tran and by Mandel–Cobb–Beezley. We discuss a variety of results including stochastic perturbation analysis of the fluctuations, the stability, and the long-time performance of these discrete generation ensemble Kalman particle filters, including time-uniform and nonasymptotic mean-error estimates that apply to possibly unstable signals. | |
15/03 | Robust Bayesian inference for Berkson and Classical Measurement Error models | |
Note: Hybrid |
Measurement error occurs when a set of covariates influencing a response variable are corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning method which is robust to mismeasured covariates, does not require the preceding assumptions, and is able to incorporate prior beliefs about the true error distribution. This approach gives rise to a general framework which is suitable for both Classical and Berkson error models via the specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it allows for some flexibility on the choice of loss function depending on the type of regression model and desired robustness properties. In this talk I will introduce the framework, provide some theoretical results on bounding the generalisation error based on the Maximum Mean Discrepancy (MMD) loss and empirically show the effectiveness of the proposed method on some synthetic and real-world examples. | |
Term 1:
Date | Speaker | Title |
06/10 | Artavazd MaranjyanLink opens in a new window (KAUST) | GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity |
Note: Hybrid |
Abstract:
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. While methods of this type have been studied for about a decade, the empirically observed acceleration properties of local training eluded all attempts at theoretical understanding. In a recent breakthrough, Mishchenko et al. (ICML 2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their method ProxSkip requires all clients to take the same number of local training steps in each communication round. Inspired by a common sense intuition, we start our investigation by conjecturing that clients with ``less important'' data should be able to get away with fewer local training steps without this impacting the overall communication complexity of the method. It turns out that this intuition is correct: we managed to redesign the original ProxSkip method to achieve this. In particular, we prove that our modified method, for which coin the name GradSkip, converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims. |
|
13/10 | Robin EvansLink opens in a new window (Oxford) | Parameterizing and Simulating from Causal Models |
Note: Hybrid |
Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way. We introduce the frugal parameterization, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest. I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa DidelezLink opens in a new window.
Reference Evans, R.J. and Didelez, V. Parameterizing and Simulating from Causal Models (with discussion), J. Roy. Statist. Ser. B, 2023. |
|
20/10 | Julian Hofstadler (Universität Passau) | Error Bounds for Adaptive MCMC within Doubly Intractable Distributions |
Note: Hybrid |
We study the problem of sampling from doubly intractable distributions, where, additionally to an unknown normalisation constant, we
|
|
27/10 | Michael FaulknerLink opens in a new window (Bristol) | Phase transitions, metastability and critical slowing down in statistical physics |
Note: In-person |
Sampling algorithms are commonplace in statistics and machine learning – in particular, in Bayesian computation – and have been used for decades to enable inference, prediction and model comparison in many different settings. They are also widely used in statistical physics, where many popular sampling algorithms first originated [1, 2]. At a high level, the goals within each discipline are the same – to sample from and approximate statistical expectations with respect to some probability distribution – but the motivations, nomenclature and methods of explanation differ significantly. This has led to challenges in communicating between the fields, and indeed the fundamental goals of one field are often misunderstood in the other. In this talk, we elucidate statistical physics for the statistician, emphasising that probability models are studied as functions of thermodynamic hyperparameters such as the temperature. This is particularly useful for characterising phase transitions, ie, boundaries in thermodynamic hyperparameter space between distinct thermodynamic phases.
We then move on to sampling algorithms, with a particular focus on the behaviour of the Metropolis algorithm [1] when simulating the 2D Ising and 2DXY models of magnetism. Metropolis dynamics are metastable in the low-temperature phase of each model, mixing between states of equal probability density on a timescale the diverges with system size (proportional to the dimensionality of parameter space). Moreover, the Metropolis algorithm also suffers from the closely related phenomenon of critical slowing down at phase transitions. These strongly correlated dynamics are characterised by asymptotically long integrated autocorrelation times, due to a flattening of the target density that essentially results from the system trying to exist simultaneously in both thermodynamic phases. Indeed, these key aspects of statistical physics have led to innovations in sampling algorithms that inform the Bayesian world. In particular, we present the Swendsen—Wang [3], Wolff [4] and event-chain Monte Carlo [5-7] algorithms. The first two simulate the 2D Ising model and were developed in response to the metastability and critical slowing down of the Metropolis algorithm. They circumvent both phenomena to mix with low autocorrelation and independent of system size. We then show that event-chain Monte Carlo similarly circumvents the low-temperature Metropolis metastability of the 2DXY model [7] and discuss its potential utility in bypassing an hypothesised critical slowing down at the phase transition. This talk is based on a recent review paper on the subject [8].
[1] Metropolis et al., J. Chem. Phys. 21 1087 (1953)
[2] Alder & Wainwright, J. Chem. Phys. 27 1208 (1957)
[3] Swendsen & Wang, Phys. Rev. Lett. 58 86 (1987)
[4] Wolff, Phys. Rev. Lett. 62 361 (1989)
[5] Bernard, Krauth & Wilson, Phys. Rev. E 80 056704 (2009)
[6] Michel, Mayer & Krauth, EPL (Europhys. Lett.) 112 20003 (2015)
[7] Faulkner, arXiv:2209.03699 (2022)
[8] Faulkner & Livingstone, arXiv:2209.03699 (2022)
|
|
03/11 | Marie-Therese WolframLink opens in a new window (Warwick) | Ensemble Inference Methods for Models with Noisy and Expensive Likelihoods (joint work with O.R. Dunbar, A. Duncan, A.M. Stuart) |
Note: Hybrid |
The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical, and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations.
In this talk we focus on interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large-scale parametric structure of interest. Multiscale analysis is then used to analyze the behavior of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large-scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favorably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, while ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favorably.
|
|
10/11 | Joe BentonLink opens in a new window (Oxford) | From Denoising Diffusions to Denoising Markov Models |
Note: In-person |
Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. In this talk, we will introduce denoising Markov models (DMMs), a unifying framework generalising diffusion models to a wide class of spaces. We will show how to derive suitable training objectives for DMMs and discover that DMMs lead to a novel and principled generalization of score matching to arbitrary state spaces. | |
17/11 | Woojung Kim (Warwick) | Latent variable approaches for multimorbidity analysis |
Note: In-person at MB0.02 12.45-13.30 |
Multimorbidity refers to the acquisition of multiple long-term chronic health conditions in a single person. This is becoming an increasing public health issue with aging populations and gaining insights into patterns of multimorbidity is essential for managing increased health system burdens. In this talk, we focus on latent variable approaches for the analysis of multimorbidity. We introduce a novel discrete latent variable approach for multimorbidity acquisition within a Bayesian framework of latent feature allocation, which allows an individual's morbidity profile to be driven by multiple latent factors and allows the modelling of age-dependent multimorbidity trajectories. We demonstrate the utility of our model in applications to both simulated data and disease event data from patient electronic health records. In each setting, we show our model can reconstruct clinically meaningful latent multimorbidity patterns and their age-dependent prevalence trajectories. |
|
24/11 |
Filippo Ascolani (Università Bocconi) |
Dimension-free mixing times of coordinate-wise samplers for Bayesian hierarchical models |
Note: In-person |
Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance. | |
01/12 | Tengyao Wang (LSE) |
Sparse change detection in high-dimensional linear regression
|
Note: In-person |
We introduce a new methodology 'charcoal' for estimating the location of sparse changes in high-dimensional linear regression coefficients, without assuming that those coefficients are individually sparse. The procedure works by constructing different sketches (projections) of the design matrix at each time point, where consecutive projection matrices differ in sign in exactly one column. The sequence of sketched design matrices is then compared against a single sketched response vector to form a sequence of test statistics whose behaviour shows a surprising link to the well-known CUSUM statistics of univariate changepoint analysis. The procedure is computationally attractive, and strong theoretical guarantees are derived for its estimation accuracy. Simulations confirm that our methods perform well in extensive settings, and a real-world application to a large single-cell RNA sequencing dataset showcases the practical relevance. | |
08/12 |
1. Ensemble Kalman inversion approximate Bayesian computation
2. Network inference in a stochastic multi-population neural mass model via approximate Bayesian computation
|
|
Note: In-person Slides1Link opens in a new window Fig1Link opens in a new window Fig2Link opens in a new window |
1.
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood, sometimes known as the ABC likelihood. The ABC likelihood is chosen such that an unbiased estimator of it is easy to construct from simulations from the data model, allowing the use of pseudo-marginal Monte Carlo algorithms for inference under the approximate model. The central challenge of ABC is then to trade-off bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics.
In this talk we introduce a new approach to estimating the ABC likelihood: using ensemble Kalman inversion (EnKI). Ensemble Kalman algorithms are Monte Carlo approximations of Bayesian inference for linear/Gaussian models. These methods are often applied outside of the linear/Gaussian setting being used, for example, as an alternative to particle filtering for inference in non-linear state space models. Loosely speaking, EnKI can be used as an alternative to an SMC sampler on a sequence of annealed likelihoods. We see that EnKI has some appealing properties when used to estimate the ABC likelihood. It circumvents the exponential scaling with dimension of standard ABC, and does not require the reparameterisation imposed by the rare-event SMC approach of Prangle et al. (2018). It is able to achieve this with no additional simulations from the data model, thus it is likely to bring the most benefit in cases where this model is very expensive to simulate.
|
|
2. The aim of this article is to infer the connectivity structures of brain regions before and during epileptic seizure. Our contributions are fourfold. First, we propose a 6N-dimensional stochastic differential equation for modelling the activity of N coupled populations of neurons in the brain. This model further develops the (single population) stochastic Jansen and Rit neural mass model, which describes human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Second, we construct a reliable and efficient numerical scheme for the model simulation, extending a splitting procedure proposed for one neural population. Third, we propose an adapted Sequential Monte Carlo Approximate Bayesian Computation algorithm for simulation-based inference of both the relevant real-valued model parameters as well as the {0,1}-valued network parameters, the latter describing the coupling directions among the N modelled neural populations. Fourth, after illustrating and validating the proposed statistical approach on different types of simulated data, we apply it to a set of multi-channel EEG data recorded before and during an epileptic seizure. The real data experiments suggest, for example, a larger activation in each neural population and a stronger connectivity on the left brain hemisphere during seizure.
Reference: S. Ditlevsen, M. Tamborrino, I. Tubikanec, Preprint at arXiv:2306.15787Link opens in a new window
|