Programme
Start |
End |
|
Wednesday 10th April 2024
|
||
08:50 |
|
Warwick Coach Group: convene at the University of Warwick Coach Park (***not the department***) as directed to here |
09:00 |
12:00 |
Warwick Coach Group: journey from the University of Warwick Coach Park to Gregynog Hall - a *sharp* departure at 09:00 |
12:30 |
13:30 |
Lunch (Dining Room) |
13:30 |
13:45 |
Room check-in available (After lunch from lobby area between the Music room and the Dining room) |
|
||
|
Session 1 (Music Room) | |
13:50 |
14:00 |
Welcome |
14:00 |
15:00 |
Invited talk: Jon Warren |
15:00 |
16:00 |
Discussion: Software development: when, why, how
Moderator: Heather Turner
|
|
||
16:00 |
16:30 |
Afternoon Tea (Blayney Room) |
|
||
|
|
Session 2 (Music Room) |
16:30 |
16:45 |
Laura Guzman Rincon |
16:45 |
17:00 |
Hannah Bensoussane |
17:00 |
17:15 |
Gracie Li |
17:15 |
17:30 |
Raiha Browning |
17:30 |
18:00 |
Teresa Brunsdon |
18:00 |
|
Pre-dinner bar open (The Davies Room (Baldwin)) |
18:15 |
18:45 |
Tony Lawrence |
19:00 |
20:00 |
Dinner (Dining Room) |
Thursday 11th April 2024 |
||
08:00 |
09:00 |
Breakfast (Dining Room) |
|
||
|
Session 3 (Music Room) - Invited talks | |
9:30 |
10:00 |
Richard Everitt |
10:00 |
10:30 |
Andrew Duncan (hybrid*) Teams meeting: linkLink opens in a new window |
10:30 |
11:00 |
Virginia Aglietti (hybrid*) Teams meeting: linkLink opens in a new window |
11:00 |
11:30 |
Morning Coffee (Blayney Room) |
|
||
|
Session 4 (Music Room) | |
11:30 |
12:50 |
Discussion: Careers in industry vs academia (hybrid*) Panelists: Andrew Duncan (Imperial), Virginia Aglietti (DeepMind) , Jeremias Knoblauch (UCL) Moderator: Richard Everitt Teams meeting: linkLink opens in a new window |
|
||
13:00 |
14:00 |
Lunch (Dining Room) |
14:00 |
16:00 |
Free Time |
16:00 |
16:30 |
Afternoon Tea (Blayney Room) |
|
||
|
Session 5 (Music Room) | |
16:30 |
18:00 |
Discussion: The role of generative AI in education and research Panelists: Martyn Plummer, David Stern (IDEMS), Andrew Dimarogonas (huxli.ai) Moderator: Jane Hutton Teams meeting: linkLink opens in a new window |
18:00 |
|
Pre-dinner bar open (The Davies Room (Baldwin)) |
19:00 |
20:00 |
Dinner (Dining Room) |
Friday 12th April 2024 |
||
08:00 |
09:00 |
Breakfast (Dining Room) |
08:00 |
09:00 |
Check out (return keys to reception before 09:00 and move luggage to luggage store) |
|
||
|
Session 6 (Music Room) | |
09:00 |
10:00 |
Invited talk: Jeremias Knoblauch |
10:15 |
10:30 |
Mengchu Li |
10:30 |
10:45 |
Gengyu Xue |
10:45 |
11:00 |
Conor Hughes |
|
||
11:00 |
11:30 |
Morning Coffee (Blayney Room)
|
11:30 |
11:40 |
Group Photo
|
Session 7 (Music Room) | ||
11:45 |
12:00 |
Jia Le Tan |
12:00 |
12:15 |
Alicia Gill |
12:15 |
12:30 |
David Huk |
|
||
13:00 |
14:00 |
Lunch (Dining Room) |
14:00 |
14:15 |
Convene outside gift shop at 14:05 for a *sharp* departure at 14:15 |
14:15 |
17:00 |
Warwick Coach Group: journey from Gregynog Hall to University of Warwick Coach Park |
* for the hybrid sessions:
join Gregynog Teams Channel hereLink opens in a new window.
Talk Abstracts
Virginia Aglietti - Causal Bayesian Optimization
In this talk, I will present recent advancements in integrating causality within Bayesian Optimization (BO). I will introduce Causal Bayesian Optimization (CBO), then discuss extensions that enable optimal action selection in causal systems. These extensions address scenarios involving constraints and settings where contextual interventions need to be explored.
Hannah Bensoussane - Bayesian individual-level infectious disease modelling: accounting for missing data while minimising computational burden
Fitting mathematical models to epidemic data is challenging because the transmission process is largely unobserved. In addition, likelihood evaluation is often costly and approaches to estimating missing data that require repeated likelihood evaluation become computationally unfeasible. Consider a set-up for an individual-level model where, to obtain a tractable likelihood, the observed data is augmented with the missing infection/infectious times and sample-based inference (adaptive MCMC) is performed to learn about model parameters. We introduce an algorithm that proposes new infection/infectious times in groups and automatically tunes this group size to achieve the user’s desired acceptance rate. The grouped nature of the updates allows us to explore novel proposal mechanisms and for each mechanism considered, we find the desired acceptance rate that results in the highest mean square jumping distance (MSJD) per second. Results reveal that significant computational burden can be avoided by updating infection/infectious times in groups. Our Adaptive Dirichlet-Multinomial (ADM) proposal mechanism (an independence sampler informed by the history of the infection/infectious time chain) is particularly successful.
Raiha Browning - A spatiotemporal statistical model for the risk of political violence and protests
A key indicator of disorder in society is the occurrence of conflict events, such as protests, riots, and battles between organised armed groups. An understanding of the real-time risk of conflict events around the world and their longer-term trends is crucial for several parties, including governments, journalists and researchers. This is especially important during times of instability and unrest. The Armed Conflict Location and Event Data (ACLED) project provides a comprehensive database containing details of political violence and protests worldwide. Currently the risk model used by the ACLED project globally relies on simple averages and is aggregated to the national level. We propose in this work a spatio-temporal, statistical model to monitor the risk of conflict events at a fine spatial scale. By collaborating with relevant actors in the humanitarian, social and political sectors, there is the potential for our model to be adopted by these actors to better understand the risk of conflict events at a subnational level and over time. A recent report by the United Nations Office for the Coordination of Humanitarian Affairs has emphasised the need to focus on understanding the risk of conflict events, in favour of other activities such as forecasting and event classification. To advance in this direction we use Hawkes processes, a self-exciting stochastic process used to describe phenomena whereby past events increase the probability of the occurrence of future events. In particular we introduce a spatio-temporal variant of the Hawkes process to the data gathered by the ACLED project for countries in South Asia to obtain sub-national estimates of risk over time. We also investigate how certain factors impact the risk of conflict, such as the type of violence. Furthermore, through a Bayesian approach we obtain estimates of the uncertainty around these risk estimates. The results from this analysis would be a useful resource for key actors in the humanitarian and political sciences sectors.
Teresa Brunsdon - Making Design of Experiments Interesting (or: Why should students have all the fun!)
I have been delivering the module on the Designed Experiments for the first time at Warwick, and one colleague commented that he hoped I could make it interesting. I don’t know if I achieved that, but I will present some of the ideas I have used to make the module real and relevant. This is particularly through using some “Statapults”. Catapults with a difference: designed for teaching experiments!
In order to fully appreciate the relevance of this, we will have a go at running an experiment as a group. This will hopefully be a good bonding exercise but also educational and help us think about both our teaching and also outreach to promote statistics. We may even have some fun!
Andrew Duncan - Statistical Divergences for Functional Data
Kernel-based discrepancies have found considerable success in constructing statistical tests which are now widely used in statistical machine learning. Examples include Kernel Stein Discrepancy which enables goodness-of-fit tests of data samples against an (unnormalized) probability density based on Stein's method. The effectiveness of the associated tests will crucially depend on the dimension of the data.
I will present some recent results on the behaviour of such tests in high dimensions, exploring properties of the statistical divergence under different scaling of data dimension and data size. Building on this, I will discuss how such discrepancies can be extended to probability distributions on infinite-dimensional spaces. I will discuss applications to goodness-of-fit testing for measures on function spaces and its relevance to various problems in UQ.
Richard Everitt - Ensemble Kalman inversion approximate Bayesian computation
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood, sometimes known as the ABC likelihood. The ABC likelihood is chosen such that an unbiased estimator of it is easy to construct from simulations from the data model, allowing the use of pseudo-marginal Monte Carlo algorithms for inference under the approximate model. The central challenge of ABC is then to trade-off bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics.
In this talk we introduce a new approach to estimating the ABC likelihood: using ensemble Kalman inversion (EnKI). Ensemble Kalman algorithms are Monte Carlo approximations of Bayesian inference for linear/Gaussian models. These methods are often applied outside of the linear/Gaussian setting being used, for example, as an alternative to particle filtering for inference in non-linear state space models. Loosely speaking, EnKI can be used as an alternative to an SMC sampler on a sequence of annealed likelihoods. We see that EnKI has some appealing properties when used to estimate the ABC likelihood. It circumvents the exponential scaling with dimension of standard ABC, and does not require the reparameterisation imposed by the rare-event SMC approach of Prangle et al. (2018). It is able to achieve this with no additional simulations from the data model, thus it is likely to bring the most benefit in cases where this model is very expensive to simulate.
Alicia Gill - Bayesian inference of reproduction number from epidemic and genomic data
Typically, reproduction number is inferred using only epidemic data, such as prevalence per day. However, prevalence data is often noisy and partially observed, and it can be difficult to identify whether you have observed many cases of a small epidemic or few cases of a large epidemic. Genomic data is therefore increasingly being used to understand infectious disease epidemiology, and inference methods incorporating both genomic and epidemiological information are an active area of research. We use Markov chain Monte Carlo methods to infer parameters of the epidemic using both a dated phylogeny and partial prevalence data to improve inference compared with using only one source of information. To do this, we have implemented a sequential Monte Carlo algorithm to infer the latent unobserved epidemic, which is then used to infer the reproduction number as it varies through time. We then analyse the performance of this approach using simulated data. Finally we present case studies applying the method to real datasets.
Laura Guzman Rincon - A block update MCMC strategy for fitting high-dimensional Gaussian processes
Gaussian processes (GP) are difficult to fit as their estimation involves the inverse of an n by n matrix, where n is the number of observations. We propose a block sampling algorithm for the inference of GPs using MCMC. This method is based on the updating strategy for Markov Random Field Models proposed by Knorr-Held and Rue, 2002.
Conor Hughes - Sparsity in Staged Trees and Chain Event Graphs
Staged trees and the equivalent Chain Event Graphs (Smith, Anderson 2008) are a recently developed, powerful family of probabilistic graphical models. When individual sample sizes are too low to draw reliable inferences on either the probability estimates or underlying stage structure, sparsity is present. Sparsity occurs through the interplay between total sample size and number of paths in the tree, or when distributions have unlikely categories that lead to small edge counts. Staged trees have the benefit of clearly displaying where sparsity occurs and warn against subsequent inferences, while also being flexible in approach. In this talk, methods to address sparsity are outlined, both based on extant methods across various fields, and novel methods developed for staged trees. These methods are demonstrated using a case study.
David Huk - Quasi-Bayesian Vines for density estimation in high dimensions
Due to its nature, high-dimensional data is riddled with complexity, in particular in the dependence that dimensions have on each other. This richness is hard to capture, demanding highly performant density estimation techniques. This problem is further emphasised when there is only a handful of sample data available, leading models with high capacity to lack generalisation abilities. In this work, we introduce a new density estimation approach based on sequential updates of quasi-Bayesian predictive densities in conjunction with vine copulas that is suited for complex data with low sample sizes. We obtain a robust modular density estimation approach that enables non-parametric quasi-Bayesian density estimation. It supports fast sampling, density evaluation, and predictive updating. We demonstrate our approach's efficacy on benchmark datasets and achieve state-of-the-art performance in comparison to established models.
Jeremias Knoblauch - Generalised Bayesian methods for accelerated computation
Generalised Bayesian posteriors were originally conceived as a way to remedy issues revolving around misspecification and calibration. In this talk, I will present a new line of work that has used a generalised Bayesian formulation to instead resolve long-standing computational issues. In particular, I will highlight how the use of kernel-based distance measures to construct generalised posteriors that can eliminate the intractable likelihood problem. Remarkably---and unlike their doubly intractable standard Bayesian counterparts---the resulting generalised posteriors are often available in closed form.
Tony Lawrance - History of Gregynog Statistical Conferences and Reflections on a Life in Statistics
Reflections include - Gregynog from 1969, starting lecturer life without a PhD, researching your thesis, producing your thesis, computing in the dark ages, by a Poisson process to the USA on the QE2, early landscape of university statistics, a life of travel experiences, a problem of point processes and my recent post-cataract lenses...
Jia Le Tan - Evaluating and Expanding Truncation and Pareto-Smoothing Techniques from Importance Sampling to Sequential Monte Carlo Methods
Importance Sampling (IS) is renowned for its theoretical unbiasedness. Yet, significant challenges emerge when there's a stark discrepancy between the proposal and target distributions, leading to high variance in the weights. This variance often results in a 'weight stealing' effect in low-likelihood regions of the proposal distribution that correspond to high-likelihood regions in the target distribution, diminishing the effective sample size. This phenomenon becomes more pronounced in higher dimensions, even with closely aligned proposal and target distributions. To mitigate these issues, Vehtari et al. (2024) have innovatively applied Pareto-Smoothing (PS) within IS, introducing Pareto-Smoothed Importance Sampling (PSIS). This approach significantly reduces weight variance with minimal bias increase, outperforming techniques such as Truncated Importance Sampling (TIS).
This presentation will commence with an analysis of TIS and PSIS effectiveness, considering both scenarios where the proposal distribution is lighter or heavier than the target. While Vehtari et al. (2024) extensively analysed the former, offering promising results, the latter scenario remains less explored. I will present findings from my research that illuminate this area further.
Subsequently, the discussion will extend to applying truncation and Pareto-Smoothing techniques within Sequential Monte Carlo (SMC) Sampling, leading to the development of Truncated SMC Sampling (T-SMC) and Pareto-Smoothed SMC Sampling (PS-SMC). We will delve into preliminary results from experiments that apply these methods at each intermediate SMC sampling step, evaluating their impact on weight adjustment for estimation and resampling processes.
Gracie Li - Restricted Adaptive Probability-Based Latin Hypercube Design
The complexity of environmental sampling comes from the combination of varied inclusion probabilities, irregular sampling region, spatial-filling requirements and sampling cost constraints. This article proposes a restricted adaptive probability-based latin hypercube design for environmental sampling. Meriting from a first stage pilot design, the approach largely reduces the computation burden under traditional adaptive sampling without network replacement, while still achieves the same effective control on the final sample size. Initial probability-based latin hypercube design warrants a spatial-filling structure within irregular sampling region. Adaptive cluster sampling step incorporates more desired samples interacting with neighborhood effects. A stopping rule based on restrictive sampling costs makes the approach of good practical use. Under the restricted adaptive probability-based latin hypercube design, Thompson-Horvitz and Hansen-Hurwitz type estimators are biased. A modified Murthy-type unbiased estimator with Rao-Blackwell improvements are thus proposed. The proposed approach is shown to have better performances than several well-known sampling methodologies.
Mengchu Li - The many facets of differential privacy
In modern data collection and analysis, the privacy of individuals is a key concern. There has been a surge of interest in developing data analysis methodologies that yield strong statistical performance without compromising individuals' privacy, largely driven by applications in modern technology companies, including Google, Apple and Microsoft, and by pressure from regulatory bodies. The prevailing framework for the development of private methodology is that of differential privacy. In this talk, I will briefly discuss several fundamental aspects of the topic.
Jon Warren - Mathematical Naming
Gengyu Xue - Change point localisation and inference in fragmented functional data
We study the problem of change point localisation and inference for sequentially collected fragmented functional data, where each curve is observed only over discrete grids randomly sampled over a short fragment. The sequence of underlying covariance functions is assumed to be piecewise constant, with changes happening at unknown time points. To localise the change points, we propose a computationally efficient Fragmented Functional Dynamic Programming (FFDP) algorithm with consistent change point localisation rates. With an extra step of local refinement, we derive the limiting distributions for the refined change point estimators in two different regimes where the minimal jump size vanishes and where it remains constant as the sample size diverges. Such results are the first time seen in the fragmented functional data literature. As a byproduct of independent interest, we also present a non-asymptotic result on the estimation error of the covariance function estimators inspired by Lin et al. (2021). Our result accounts for the effects of the sampling grid size within each fragment under novel identifiability conditions. Extensive numerical studies are also provided to support our theoretical results.