A1 - Dr. Andi Wang
Title: Compositional foundations of probability and statistics
Abstract: Traditionally, probability and statistics have been founded on measure theory. However, for researchers in applied probability, statistics and machine learning, it is very rare that one actually refers to explicitly-constructed probability spaces, sigma-algebras or measurability. The recent perspective of categorical probability instead begins with compositional structure, and provides a rigorous approach to reason about stochasticity, with measure-theoretic probability being an example, rather than the foundation. In this talk, I will introduce categorical probability, and give a concrete application to Markov chain Monte Carlo. I will not assume any prior knowledge of category theory. The latter part of the talk is based on joint work with Rob Cornish.
A2 - Dr. Filippo Pagani
Title: A discomfort-informed adaptive Gibbs sampler for finite mixture models
Abstract: Finite mixture models are frequently used to uncover latent structures in high- dimensional datasets (e.g. identifying clusters of patients in electronic health records). The inference of such structures can be performed in a Bayesian framework, and involves the use of sampling algorithms such as Gibbs samplers aimed at deriving posterior distribution of the probabilities of observations to belong to specific clusters. Unfortunately, traditional implementations of Gibbs samplers in this context often face critical challenges, such as inefficient use of computational resources and unnecessary updates for observations that are highly likely to remain in their current cluster. This paper introduces a new adaptive Gibbs sampler that improves the convergence efficiency over existing methods. In particular, our sampler is guided by a function that, at each iteration, uses the past of the chain to focus the updating on observations potentially misclassified in the current clustering, i.e. those with a low probability of belonging to their current component. Through simulation studies and two real data analyses, we empirically demonstrate that, in terms of convergence time, our method tends to perform more efficiently compared to state-of-the-art approaches.
B1 - Florian Gutekunst
Title: Optimal Consumption in non-Markovian Stochastic Factor Models
Abstract: We study optimal investment and consumption over the infinite horizon under power utility in a non-Markovian incomplete stochastic factor model. Using the method of sub- and supersolutions, we prove the existence of a solution to an associated infinite horizon BSDE, obtain tight bounds on the optimal consumption rate, and prove a verification theorem. We apply our theory to the rough Heston model.
B2 - Dr. Zhengang Zhong
Title: Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions
Abstract: Laplace learning is a semi-supervised method, a solution for finding missing labels from a partially labeled dataset utilizing the geometry given by the unlabeled data points. The method minimizes a Dirichlet energy defined on a (discrete) graph constructed from the full dataset. In finite dimensions the asymptotics in the large (unlabeled) data limit are well understood with convergence from the graph setting to a continuum Sobolev semi-norm weighted by the Lebesgue density of the data-generating measure. The lack of the Lebesgue measure on infinite-dimensional spaces requires rethinking the analysis if the data aren’t finite-dimensional. In this talk, I will introduce the first step in this direction by analyzing the setting when the data are generated by a Gaussian measure on a Hilbert space and proving pointwise convergence of the graph Dirichlet energy.
C1 - Dr. Ibrahim Kaddouri
Title: Clustering risk under the slowly mixing hidden Markov model
Abstract: We study the problem of clustering under a hidden Markov model with Gaussian emissions, focusing on the regime where the hidden chain mixes slowly. We provide a precise characterization of how the Bayes risk depends on the model parameters and construct a Bayes-optimal clustering procedure. Notably, our analysis reveals surprising and non-standard behavior of the Bayes risk in certain parameter regimes, offering new insights into the interplay between signal strength and temporal dependence.
C2 - Alexander Kent
Title: Rate Optimality and Phase Transition for User-Level Local Differential Privacy
Abstract: Given demands for rigorous data privacy guarantees from both a regulatory standpoint and from the concerns of the data subjects, definitions of privacy which can be theoretically validated are of great interest. One such method enjoying significant popularity in both academia and industry is that of differential privacy in which carefully calibrated noise is added to data to provide plausible deniability as to the true value. Differential privacy appears in both the central model, where a trusted aggregator has access to the data and releases a privatised output, and the local model, where each user adds noise before publishing their (now privatised) data to a potentially untrusted aggregator. Referring to the traditional setting where each of the n data subjects hold a single data point as item-level privacy, a growing field of interest is that of user-level privacy where each of the n users holds T observations and wishes to maintain the privacy of their entire collection. We consider the model of user-level local differential privacy, which is relatively unexplored. Indeed, even for a problem as fundamental as univariate mean estimation, prior to this work the minimax rate of estimation was undetermined. We aim to fill this gap, obtaining minimax optimal estimation rates for a range of canonical statistical estimation problems including univariate and multidimensional mean estimation, sparse mean estimation, and non-parametric density estimation. We first derive a general minimax lower bound, which shows that the risk cannot, in general, be made to vanish for a fixed number of users even when T is arbitrarily large. We then derive matching, up to logarithmic factors, lower and upper bounds for the aforementioned canonical problems. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as T, the number of observations each user holds, varies.
D1 - Matthew Adeoye
Title: Bayesian Copula-Based Modelling For Multi-Type Spatio-Temporal Outbreak Data
Abstract: The study of infectious disease outbreaks from multi-type disease pathogens often requires modelling techniques that account for the complex interactions existing between strains of the pathogen across geographical locations and time. In this talk, I will introduce a novel multi-type spatio-temporal model to better support the understanding of these pathogens. I will show a computationally efficient MCMC sampling scheme for the proposed models and some simulation/real-world results.
D2 - Federico Perlino
Title: A Bayesian Parametric and Nonparametric Approach for the Imputation of Multivariate Left-Censored Data Due to Limit of Detection
Abstract: Left-censored observations due to limits of detection and/or quantification are common in clinical and epidemiologic research when continuous predictors are assessed from human specimens. In these settings, values below a certain threshold are not detectable in laboratory analysis and are reported as missing in the dataset. Classical imputation approaches have mostly relied on imputing the same number for all non-detected samples, thus compromising the continuous nature of the censored variables and affecting their variability and potential inclusion in regression modeling. Continuous imputations have been presented, but generally focusing on a single variable at the time. It is common, moreover, for the same human specimen to be used for the quantification of several biomarkers or exposures simultaneously, thus resulting in a complex set of multivariate and possibly correlated left-censored observations. To the best of our knowledge, there is no established framework that flexibly accounts for the real-world complexity of these data. We propose a Bayesian multiple imputation (MI) approach that relies on the introduction of multivariate latent variables to handle multivariate left-censored data. We present a general framework, accommodating both a parametric approach, assuming multivariate normality of the data, and a nonparametric approach, modeling observations by means of a location Dirichlet process mixture of multivariate normal kernels. Both approaches are implemented through a Gibbs sampling scheme. The performances of our approach are investigated with a simulation study based on environmental exposures, and illustrated by analyzing a real dataset on cardiovascular biomarkers
D3 - Jia Le Tan
Title: Approximate Bayesian Inference for Ecological Dynamics, with Applications to Fisheries
Abstract: Ecological systems are often described by complex models that capture nonlinear dynamics, stochasticity, and partial observability, with fisheries providing a key motivating application. In many cases, these models lead to likelihoods that are unavailable or too expensive to compute, making exact Bayesian inference impractical. Moreover, because these models necessarily simplify complex real-world processes, they are often susceptible to model misspecification, creating further challenges for reliable inference and uncertainty quantification. This talk presents ongoing work on approximate Bayesian inference methods for such settings, spanning classical approaches such as approximate Bayesian computation, synthetic likelihood, and their sequential Monte Carlo variants, as well as more recent simulation-based methods including neural posterior and neural likelihood estimation. While these methods offer a flexible framework for uncertainty-aware inference when conventional likelihood-based methods are not viable, their application in ecological and fisheries settings also presents important practical challenges, which I will discuss in this talk.