2021-2022
The seminars will take place on Fridays 1 pm UK time in room MB0.08 in a hybrid format.
2021-2022 Organisers: Alice CorbellaLink opens in a new window and Lyudmila GrigoryevaLink opens in a new window
If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Website URL: www.warwick.ac.uk/compstat
Mailing List Sign-Up: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB - only approved members can post)
2021/22 Term 1
The list of firmly confirmed speakers.
Date | Speaker | Title | F2F | Slides | Video | |
Week 1 08/10 | Lorenzo PacchiardiLink opens in a new window | |||||
Abstract: To perform Bayesian inference for stochastic simulator models for which the likelihood is not accessible, Likelihood-Free Inference (LFI) relies on simulations from the model. Standard LFI methods can be split according to how these simulations are used: to build an explicit Surrogate Likelihood, or to accept/reject parameter values according to a measure of distance from the observations (Approximate Bayesian Computation, ABC). In both cases, simulations are adaptively tailored to the value of the observation. Here, we generate parameter-simulation pairs from the model independently on the observation, and use them to learn a conditional exponential family likelihood approximation; to parametrize it, we use Neural Networks whose weights are tuned with Score Matching. With our likelihood approximation, we can employ MCMC for doubly intractable distributions to draw samples from the posterior for any number of observations without additional model simulations, with performance competitive to comparable approaches. Further, the sufficient statistics of the exponential family can be used as summaries in ABC, outperforming the state-of-the-art method in five different models with known likelihood. Finally, we apply our method to a challenging model from meteorology. |
||||||
Week 2 15/10 | Internal jam session: 2-5 minutes brief intro to the ongoing research projects or some new ideas | |||||
Week 3 22/10 | Petros DellaportasLink opens in a new window | Negligible-cost Variance Reduction for Metropolis-Hastings Chains | ||||
Abstract: We provide a general methodology to construct control variates for any discrete time random walk Metropolis and Metropolis-adjusted Langevin algorithm Markov chains that can achieve, in a post-processing manner and with a negligible additional computational cost, impressive variance reduction when compared to the standard MCMC ergodic averages. Our proposed estimators are based on an approximate solution of the Poisson equation for a multivariate Gaussian target densities of any dimension. |
||||||
Week 4 29/10 | Sanmitra GhoshLink opens in a new window | Variational Inference for Nonlinear ODEs | ||||
Abstract: Complex biological systems are often modelled using nonlinear ordinary differential equations (ODE) which provide a rich framework for describing the dynamic behaviour of many interacting physical variables representing quantities of biological importance. Bayesian inference of unknown quantities in such models is carried out using MCMC. However, MCMC incurs a significant computational cost as it requires repeated evaluation of various iterative algorithms that seek the numerical solution of a nonlinear ODE. Variational inference, as an optimisation based alternative to MCMC, has the potential to expedite Bayesian inference for ODEs. Despite its potential usefulness in ODE inference problems, variational inference in its classical formulation can only be applied to conjugate models. We thus apply the "reparameterisation trick", popularised recently in machine learning applications, to obtain a "black-box" reformulation of variational inference for ODEs. Our proposed variational formulation does not depend on any emulation of the ODE solution and only requires the extension of automatic differentiation to an ODE. We achieve this through a novel and holistic approach that uses both forward and adjoint sensitivity analysis techniques. Consequently, this approach can cater to both small and large ODE models efficiently. Furthermore, ODEs can be used to approximate diffusion processes and stochastic kinetic systems. We show how our variational formulation can be used to carry out inference for such stochastic dynamical systems. We empirically evaluate the proposed inference method on some widely used mechanistic models. The proposed inference method produced a reliable approximation to the posterior distribution, with a significant reduction in execution time, in comparison to MCMC. |
||||||
Week 5 05/11 | Azadeh KhaleghiLink opens in a new window | On Some Probabilities and Limitations of Restless Multi Armed Bandits | ||||
Abstract: The Multi-Armed Bandit (MAB) problem is one of the most central instances of sequential decision making under uncertainty, which plays a key role in online learning and optimization. MABs arise in a variety of modern real-world applications, such as online advertisement, Internet routing, and sequential portfolio selection, only to name a few. In this problem, a forecaster aims to maximize the expected sum of the rewards actively collected from unknown processes. Stochastic MABs are typically studied under the assumption that the rewards are i.i.d.. However, this assumption does not necessarily hold in practice. In this talk I will discuss some possibilities and limitations of a more challenging, yet more realistic (restless) MAB setting, where the reward distributions may exhibit long-range dependencies. Reference: http://www.jmlr.org/papers/volume20/17-547/17-547.pdfLink opens in a new window |
||||||
Week 6 12/11 | Sam PowerLink opens in a new window | Accelerated Sampling on Discrete Spaces with Non-Reversible Markov Jump Processes | ||||
Abstract: In Bayesian inference problems and elsewhere, Markov Chain Monte Carlo (MCMC) algorithms are an indispensable tool for sampling from complex probability distributions. On continuous state-spaces, there has been a great deal of successful work on how to construct efficient MCMC dynamics which can converge quickly, under very general circumstances. Much of this success has stemmed from identifying continuous-time dynamical processes (ODEs, SDEs, PDMPs) which admit the desired invariant measure, and then discretising those processes to form tractable discrete-time chains. This approach has apparently seen less use in the setting of discrete spaces. In this work, we aim to bridge this gap by identifying `canonical’ Markov processes (both reversible and non-reversible) on structured discrete spaces which admit a given invariant measure, and then use them to derive new algorithms for efficient sampling on discrete spaces. The algorithms are parameter-free (no tuning is required) and can be simulated directly in continuous time, easily and without discretisation error. We provide theoretical results supporting the use of non-reversible dynamics, and a range of numerical experiments demonstrate the practical benefits of our algorithms. This is joint work with Jacob Vorstrup Goldman (Cambridge). |
||||||
Week 7 19/11 | Marta CatalanoLink opens in a new window | A Wasserstein Index of Dependence for Bayesian Nonparametric modeling | ||||
Abstract: Optimal transport (OT) methods and Wasserstein distances are flourishing in many scientific fields as an effective means for comparing and connecting different random structures. In this talk we describe the first use of an OT distance between Lévy measures with infinite mass to solve a statistical problem. Complex phenomena often yield data from different but related sources, which are ideally suited to Bayesian modeling because of its inherent borrowing of information. In a nonparametric setting, this is regulated by the dependence between random measures: we derive a general Wasserstein index for a principled quantification of the dependence gaining insight into the models’ deep structure. It also allows for an informed prior elicitation and provides a fair ground for model comparison. Our analysis unravels many key properties of the OT distance between Lévy measures, whose interest goes beyond Bayesian statistics, spanning to the theory of partial differential equations and of Lévy processes. |
||||||
Week 8 26/11 | Edward IonidesLink opens in a new window | Bagging and Blocking: Inference via Particle Filters for Interacting Dynamic Systems | ||||
Abstract: Infectious disease transmission is a nonlinear partially observed stochastic dynamic system with topical interest. For low-dimensional systems, models can be fitted to time series data using Monte Carlo particle filter methods. As dimension increases, for example when analyzing epidemics among multiple spatially coupled populations, basic particle filter methods rapidly degenerate. A collection of independent Monte Carlo calculations can be combined to give a global filtering solution with favorable theoretical scaling properties. The independent Monte Carlo calculations are called bootstrap replicates, and their aggregation is called a bagged filter. Bagged filtering is effective at likelihood evaluation for a model of measles transmission within and between cities. A blocked particle filter also works well at this task. Bagged and blocked particle filters can both be coerced into carrying out likelihood maximization by iterative application to an extension of the model that has stochastically perturbed parameters. Numerical results are carried out using the R package spatPomp. |
||||||
Week 9 03/12 | Accurate and Efficient Numerical Methods for Molecular Dynamics and Data Science Using Adaptive Thermostats | |||||
Abstract: I will discuss the design of state-of-the-art numerical methods for sampling probability measures in high dimension where the underlying model is only approximately identified with a gradient system. Extended stochastic dynamical methods, known as adaptive thermostats that automatically correct thermodynamic averages using a negative feedback loop, are discussed which have application to molecular dynamics and Bayesian sampling techniques arising in emerging machine learning applications. I will also discuss the characteristics of different algorithms, including the convergence of averages and the accuracy of numerical discretizations. |
||||||
Week 10 10/12 | Lionel Riou-DurandLink opens in a new window |
Metropolis Adjusted Underdamped Langevin Trajectories (jointly with Jure VogrincLink opens in a new window (University of Warwick)) |
||||
Abstract: Sampling approximations for high dimensional statistical models often rely on so-called gradient-based MCMC algorithms. It is now well established that these samplers scale better with the dimension than other state of the art MCMC samplers, but are also more sensitive to tuning [5]Link opens in a new window. Among these, Hamiltonian Monte Carlo is a widely used sampling method shown to achieve gold standard scaling with respect to the dimension [1]Link opens in a new window. However it is also known that its efficiency is quite sensible to the choice of integration time, see e.g. [4]Link opens in a new window, [2]Link opens in a new window. This problem is related to periodicity in the autocorrelations induced by the deterministic trajectories of Hamiltonian dynamics. To tackle this issue, we develop a robust alternative to HMC built upon underdamped Langevin (namely Metropolis Adjusted Underdamped Langevin Trajectories, or MAULT), inducing randomness in the trajectories through a continuous refreshment of the velocities. We study the optimal scaling problem for MAULT and recover the scaling of HMC proven in [1]Link opens in a new window without additional assumptions. Furthermore we highlight the fact that autocorrelations for MAULT can be controlled by a uniform and monotonous bound thanks to the randomness induced in the trajectories, and therefore achieves robustness to tuning. Finally, we compare our approach to Randomized HMC ([2], [3]Link opens in a new window) and establish quantitative contraction rates for the 2-Wasserstein distance that support the choice of underdamped Langevin dynamics. |
2021/22 Term 2
The list of firmly confirmed speakers.
Date | Speaker | Title | F2F | Slides | Video | |
Week 1 14/01 |
Ryan MartinLink opens in a new window | Data-driven Calibration of Generalized Posterior Distributions | ||||
Abstract: Bayesian inference based on a well-specified likelihood is (modulo regularity conditions) approximately calibrated in the sense that credible regions are approximate confidence regions. But well-specified likelihoods are the exception, not the norm, so Bayesian inference generally comes with no calibration guarantees. To overcome this, the statistician can consider a more flexible generalized posterior distribution and use data-driven methods to ensure that their corresponding generalized posterior inference is approximately calibrated. In this talk, I'll focus on cases where the generalized posterior involves a free learning rate parameter and present a bootstrap-based algorithm designed specifically to choose that learning rate such that the posterior inference is approximately calibrated. I'll also present an extension of this calibration strategy designed to deal directly with prediction under misspecification. |
||||||
Week 2 21/01 |
No seminar | |||||
Week 3 28/01 |
Filippo Pagani | Numerical Zig-Zag and Perturbation Bounds on Numerical Error | ||||
Abstract: This talk is about NuZZ (Numerical Zig-Zag), and some soon-to-be arXived improvements on the initial version. Piecewise Deterministic Markov Processess (PDMPs) are a new kind of stochastic process that can be used at the heart of MCMC algorithms to explore the state space. PDMPs are irreversible processes (roughly, they tend to go further than diffusions in the same time interval, have lower asymptotic variance, etc.) whose properties allow exact subsampling of the data (important for large datasets). The Zig-Zag sampler is a promising new PDMP-based MCMC algorithm that combines these two properties to achieve interesting results. However, The Zig-Zag dynamics is difficult to simulate as it requires certain cdf's to be invertible, or bounds on the gradient of the target distribution (hopefully tight). The Numerical Zig-Zag inverts those cdf's numerically, which makes the Zig-Zag applicable to a vast class of models, at the cost of losing exactness. The talk will introduce the Zig-Zag sampler and NuZZ, skim through some numerical results, and concentrate slightly more on the new results on perturbation theory, where we bound the discrepancy on ergodic averages from exact and approximate samples in terms of the numerical error tolerances. |
||||||
Week 4 04/02 |
Emilia Pompe | Robust Inference using Posterior Bootstrap | ||||
Abstract: Bayesian inference is known to provide misleading uncertainty estimation when the considered model is misspecified. This talk will explore various alternatives to standard Bayesian inference under model misspecification, based on extensions of the Weighted Likelihood Bootstrap (Newton & Raftery, 1994). |
||||||
Week 5 11/02 |
Benedict Leimkuhler | Partitioned Integrators and Multirate Training of Deep Neural Networks | ||||
Abstract: We have been exploring the design of improved training schemes for neural networks based on parameter partitioning and the use of hybrid algorithms. We previously found that we can dramatically accelerate training in classification tasks using such partitionings, in combination with additive noise and adaptive stochastic algorithms. More recently we have been exploring the use of ``multirate'' schemes for transfer learning tasks. We have found that for applications in image analysis and NLP we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting models. I will state a convergence result for our multirate scheme and also discuss splitting choices for the parameters which enhance generalization performance when neural networks are trained from scratch. A multirate approach can be used to learn different features present in the data and as a form of regularization. Our work unlocks the potential of using multirate methods for neural network training and provides starting points for future work in this area. |
||||||
Week 6 18/02 |
Laura Guzman RinconLink opens in a new window |
Bayesian estimation of the instant growth rate of SARS-CoV-2 positive cases in England, using Gaussian processes.
|
||||
Abstract:The growth rate estimation of SARS-CoV-2 positive cases is crucial for understanding the evolution of the pandemic. We propose a method for estimating the growth rate of the proportion of positive cases in England and its local authorities. The proposed Bayesian model incorporates a Gaussian process as a latent effect, employed to compute the growth rate and higher derivatives. This method does not make assumptions about generation times and can be adapted to different spatial geographies and population subgroups. (Preprint: https://www.medrxiv.org/content/10.1101/2022.01.01.21268131v1.fullLink opens in a new window). |
||||||
Week 7 25/02 |
Martyn Plummer | Developing JAGS | ||||
Abstract: It has been nearly 5 years since the last release of JAGS (Just Another Gibbs Sampler), a program for Bayesian inference using MCMC. It is still being widely used and development has continued during this period, although at a somewhat reduced pace. I will talk about some of the issues that arise in maintaining widely used software with a long lifespan, including: why writing software is like gardening, why error messages are so hard to get right, and why you should pay attention to coding style guides. I will preview some of the features in the development version (Future JAGS 5.0.0) including my attempts to add differentiation, which should allow JAGS to implement more modern sampling methods. |
||||||
Week 8 04/03 |
Kamran PentlandLink opens in a new window | GParareal: A time-parallel ODE solver using Gaussian process emulation | ||||
Abstract:
Sequential numerical methods for integrating initial value problems (IVPs) can be prohibitively expensive when high numerical accuracy is required over the entire interval of integration. One remedy is to integrate in a parallel fashion, "predicting" the solution serially using a cheap (coarse) solver and "correcting" these values using an expensive (fine) solver that runs in parallel on a number of temporal subintervals. In this work, we propose a time-parallel algorithm (GParareal) that solves IVPs by modelling the correction term, i.e. the difference between fine and coarse solutions, using a Gaussian process emulator. This approach compares favourably with the classic parareal algorithm and we demonstrate, on a number of IVPs, that GParareal can converge in fewer iterations than parareal, leading to an increase in parallel speed-up. GParareal also manages to locate solutions to certain IVPs where parareal fails and has the additional advantage of being able to use archives of legacy solutions, e.g. solutions from prior runs of the IVP for different initial conditions, to further accelerate convergence of the method - something that existing time-parallel methods do not do.
This is joint work with M. Tamborrino, T. J. Sullivan, J. Buchanan, and L. C. Appel.
|
||||||
Week 9 11/03 |
No Seminar | |||||
Week 10 18/03 |
Ioannis KosmidisLink opens in a new window |
2021/22 Term 3
The list of firmly confirmed speakers.
Date | Speaker | Title | F2F | Slides | Video |
week 1 29/04 | No seminar | ||||
week 2 06/05 | Sinho ChewiLink opens in a new window | Improved dimension dependence for MALA and lower bounds for sampling | |||
Abstract: The optimal scaling literature predicts that the mixing time of the Metropolis-adjusted Langevin Algorithm (MALA) scales as d^{1/3}, where d is the dimension. However, the scaling limit requires stringent assumptions and is asymptotic in nature. In this work, we improve the state-of-the-art non-asymptotic mixing time bound for MALA on the class of log-smooth and strongly log-concave distributions from O(d) to O(d^{1/2}), under the additional assumption of a warm start; moreover, our bound is sharp. Our proof introduces a new technique based on a projection characterization of the Metropolis adjustment which reduces the study of MALA to the discretization analysis of the Langevin SDE. Afterwards, I will briefly discuss the elusive problem of obtaining lower bounds for sampling, including recent work which establishes the first tight complexity bound for sampling in one dimension. This is joint work with Kwangjun Ahn, Xiang Cheng, Patrik Gerber, Thibaut Le Gouic, Chen Lu, and Philippe Rigollet. | |||||
week 3 13/05 | Simon SpencerLink opens in a new window | Accelerating adaptation in MCMC algorithms | |||
Abstract: The Metropolis-Hastings random walk algorithm remains popular with practitioners due to the wide variety of situations in which it can be successfully applied and the extreme ease with which it can be implemented. Adaptive versions of the algorithm use information from the early iterations of the Markov chain to improve the efficiency of the proposal. In this talk I will describe how to reduce the number of iterations needed to adapt the proposal to the target, which is particularly important when the likelihood is time-consuming to evaluate. First, the accelerated shaping algorithm is a generalisation of both the adaptive proposal and adaptive Metropolis algorithms. It is designed to remove misleading information from the estimate of the covariance matrix of the target. Second, the accelerated scaling algorithm rapidly changes the scale of the proposal to achieve a target acceptance rate. Finally, I will show how the same ideas can be applied to efficiently update parameter vectors with Dirichlet priors. | |||||
week 4 20/05 | Sida ChenLink opens in a new window | Bayesian spline-based hidden Markov models with applications to activity acceleration data and sleep analysis | |||
Abstract: B-spline-based hidden Markov models use B-splines to specify the emission distributions and offer a more flexible modelling approach to data than conventional parametric HMMs. We introduce a Bayesian framework for inference in these models where the number of states may be unknown along with other model parameters. A trans-dimensional Markov chain inference algorithm is proposed to identify a parsimonious knot configuration of the B-splines while model selection regarding the number of states can be performed based on the marginal likelihood within a parallel sampling framework. Using an extensive simulation study, we establish the superiority of our proposed methodology in comparison to alternative approaches. We will also present an extension of the basic model - a novel hierarchical conditional HMM to analyse human accelerator activity data for circadian and sleep modelling.
|
|||||
week 5 27/05 | Lucia PaciLink opens in a new window | Model-based clustering for categorical data via Hamming distance | |||
Abstract: In this work a model-based approach for clustering categorical data with no natural ordering is introduced. The proposed method exploits the Hamming distance to define a family of probability mass functions to model categorical data. The elements of this family are considered as kernels of a finite mixture model with unknown number of components. Fully Bayesian inference is provided using a sampling strategy based on a trans-dimensional blocked Gibbs-sampler, facilitating the computation with respect to the customary reversible-jump algorithm. Model performances are assessed via a simulation study, showing improvements in clustering recovery over existing approaches. Finally, our method is illustrated with application to reference datasets. Joint work with Raffaele Argiento and Edoardo Filippi-Mazzola. | |||||
week 6 3/06 | BANK HOLIDAY | ||||
week 7 10/06 | Gareth Roberts | Biased Draws and Corrections | |||
Abstract: Draws for major sporting events are often televised and carried out in a sequential fashion to maximise excitement and to increase anticipation for the sporting event itself. In this regard, organisations such as FIFA and UEFA in football have been highly successful. However these draw procedures are also often subject to constraints which make the problem of simulating a fair draw (ie uniform over all feasible draws which satisfy the constraints) difficult to achieve using a sequential procedure. For example the recent FIFA World Cup draw imposed geographical constraints to avoid countries from the same continental confederation (apart from Europe) playing each other in the group stages. Recent draws by FIFA and UEFA have all (to a greater or lesser extent) been biased. From a computational statistical perspective, they have been based on approximate SMC procedures. This talk will investigate these biases and suggest practical solutions which respect the desire to unveil such a draw in a sequential fashion. The main focus will be on the FIFA 2022 World Cup draw. This is joint work with Jeffrey RosenthalLink opens in a new window | |||||
week 8 17/06 | Badr-Eddine Chérief-AbdellatifLink opens in a new window | Robust Estimation via Maximum Mean Discrepancy | |||
Abstract: In this talk, we will study the properties of a minimum distance estimator based on the Maximum Mean Discrepancy (MMD). We will show that this estimator is universal in the i.i.d. setting: even in case of misspecification, it converges to the best approximation of the data generation process in the model, without any assumption on this model. We will also show that these results remain valid when the data are not independent, but rather satisfy a weak-dependence condition. This condition is based on a new dependence coefficient, which is itself defined using the MMD. We will argue with examples that this new notion of dependence is in fact quite general. | |||||
week 9 24/06 | Mauro Camara-EscuderoLink opens in a new window |
Approximate Manifold Sampling
|
|||
Abstract: Sampling from a probability density constrained to a manifold is of importance in numerous applications arising in statistical physics, statistics or machine learning. Sampling from such constrained densities, in particular using an MCMC approach, poses significant challenges and it is only recently that correct solutions have been proposed. The resulting algorithms can however be computationally expensive. We propose a relaxation of the problem where the support constraint is replaced with that of sampling from a small neighbourhood of the manifold. We develop a family of bespoke and efficient algorithms adapted to this problem and demonstrate empirically their computational superiority, which comes at the expense of a modest bias. Co-Authors are Christophe Andrieu, Mark Beaumont |
|||||
week 10 1/7 | End of year BBQ |