# Statistics Seminar

**All of Statistics:** unashamedly stealing the famous book-title of Larry Wasserman to describe this seminar series as we intend to bring together everyone working in statistics with different hats on your head (statistician, mathematician, probabilist, machine learner etc.). -- Organisers.

Organiser: Ritabrata DuttaLink opens in a new window

**Time: Monday **13:00-14:00 during term time.

**Venue:** MB0.07, MSB (in-person only)

Previously run as regular seminar series of CRiSM.

##### Next Speaker:

**On 3rd June 2024 **we would have our very own Dr. Sam Olesker-Taylor from the Statistics department of University of Warwick.

**Title: **An Analysis of Elo Rating Systems via Markov Chains

**Abstract: **We present a theoretical analysis of the Elo rating system, a popular method for ranking skills of players in an online setting. In particular, we study Elo under the Bradley–Terry–Luce model and, using techniques from Markov chain theory, show that Elo learns the model parameters at a rate competitive with the state of the art. We apply our results to the problem of efficient tournament design and discuss a connection with the fastest-mixing Markov chain problem.

#### Full list of talks on 2023-24:

**Inaugural talk on 23rd October 2023** would be given by our very own Professor Gareth Roberts.

**Title:** Bayesian Fusion

**Abstract:** Suppose we can readily access samples from but we wish to obtain samples from . The so-called Bayesian Fusion problem comes up within various areas of modern Bayesian Machine Learning, for example in the context of big data or privacy constraints, as well as more traditional statistical areas such as meta-analysis. Many approximate solutions to this problem have been proposed. However this talk will present an exact solution based on rejection sampling in an extended state space, where the accept/reject decision is carried out by simulating the skeleton of a suitably constructed auxiliary collection of Brownian bridges.

**On 6th November 2023**we would have Professor Johannes Schmidt-Hieber visiting us from

**Title:**

**Statistical learning in biological neural networks****Abstract:**Compared to artificial neural networks (ANNs), the brain learns faster, generalizes better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain, but differ in several crucial aspects. For instance, ANNs are deterministic while biological neural networks (BNNs) are stochastic. Moreover, it is biologically implausible that the learning of the brain is based on gradient descent. In this talk we look at biological neural networks as a statistical method for supervised learning. We relate the local updating rule of the connection parameters in BNNs to a zero-order optimization method and derive some first statistical risk bounds.

**On 20th November 2023**the talk would be given by our very own Professor Jim Smith.

**Title: Graphical Models of Intelligent Cause**

**Abstract:**Graphical models are now widely used to express underlying mechanisms which drive and explain how such mechanisms work. In particular Bayesian Networks and more recently Chain Event Graphs have been used to produce probabilistic predictive models of processes. Such graphs are chosen to be consistent with elicited natural explanations of how and why things happen the way they do in a given domain. Causal algebras are then specified which use this elicited information to determine predictions of what might happen were the system be subjected to various controls.

But how could we extend this work so that it might apply to produce predictive models of what might happen when the decision maker believes that his controls might be resisted? In this talk I will argue that standard causal models then need to be generalised to embed a decision maker's beliefs of the intent capability and the information a resistant adversary might have about the intervention after it has been made. After reviewing recent advances in general forms of Bayesian dynamic causal models I will describe how - using a special form of Adversarial Risk Analysis - we are developing new intelligent algorithms to produce such predictions. The talk will be illustrated throughout by examples of various adversarial threats currently being analysed within the UK.

**On 4th December 2023**we would have Professor Geoff Nicholls visiting us from the University of Oxford.

**Title:**Partial order models for rank data

**Abstract:**In rank-order data assessors give preference orders over choice sets. These can be thought of as permutations of the choice sets ordered by preference from best to worst. We call these permutations ``lists’’. Well known parametric models for list-data include the Mallows model and the Plackett-Luce model. These models seek a total order which is ``central’’ to the lists provided by the assessors. Extensions model the list-data as realisations of a mixture of distributions each centred on a total order. We give a model for list-data which is centred on a partial order. We give a prior over partial orders with several nice properties and explain how to carry out Bayesian inference for the unknown true partial order constraining the list-data. Model comparison favours the partial order model in all data sets we have looked at so far. However, evaluation of the likelihood costs #P. We give a model which admits scalable inference and a timeseries model for evolving partial orders. The timeseries model was motivated by queue-data informing an evolving social hierarchy, which we model as an evolving partial order.

(This is joint work with Kate Lee, Jessie Jiang, Nicholas Karn, David Johnson, Alexis Muir-Watt and Rukuang Huang.)

**On 29th January 2024** we would have Professor Cristiano Varin visiting us from the Ca’ Foscari University of Venice (Italy)

**Title:** Scalable Estimation of Probit Models with Crossed Random Effects

**Abstract:** Abstract: Crossed random effects structures arise in many scientific contexts. They raise severe computational problems with likelihood and Bayesian computations scaling like N^(3/2) or worse for N data points. In this paper we develop a composite likelihood approach for crossed random effects probit models. For data arranged in R rows and C columns, the likelihood function includes a very difficult R + C dimensional integral. The composite likelihood we develop uses the marginal distribution of the response along with two hierarchical models. The cost is reduced to O(N) and it can be computed with R + C one-dimensional integrals. We find that the commonly used Laplace approximation has a cost that grows superlinearly. We get consistent estimates of the probit slope and variance components from our composite likelihood algorithm. We also show how to estimate the covariance of the estimated regression coefficients. The algorithm scales readily to a data set of five million observations from Stitch Fix with R + C > 700,000.

This is joint work with Ruggero Bellio (Udine), Swarnadip Ghosh (Stanford) and Art B. Owen (Stanford).

**On 5th February 2024** the talk would be given by our very own Professor David Hobson.

**Title:** Martingale optimal transport and applications to finance.

**Abstract:** The optimal transport problem is about how to move n-interchangeable objects from one set of locations to another with minimum effort. Put more mathematically, we have mass distributed according to the law which we want to transport to the distribution , and we want to do so in a way which minimises the total cost. In martingale optimal transport we add a further constraint that under the redistribution or transport, any mass at , on average stays at . In this talk I'll try to explain why MOT arises very naturally in mathematical finance, and talk about some simple and not-so-simple explicit solutions to some canonical problems.

**On 19th February 2024** we would have Professor Terry Lyons visiting us from the University of Oxford.

**Title: **The Mathematics of Complex Streamed Data

**Abstract: ***Complex streams of evolving data are better understood by their effects on nonlinear systems that by their values at times. The question of which nonlinear systems would seem to be context dependent, but it is not. Core to rough path theory is a simple universal nonlinear system that captures all the information needed to predict any response to any nonlinear system. This idealised mathematical feature set is known as the signature of the stream. Its abstract simplicity opens the possibilities for understanding and working with streams in the same context free way that calculators work with numbers. Signature-based techniques offer simple to apply universal numerical methods that are robust to irregular data and efficient at representing the order of events and complex oscillatory data. Specific software can be developed and then applied across many contexts. Signatures underpin prize winning contributions in recognizing Chinese handwriting, in detecting sepsis, and in generating financial data, and most recently in the ability to score streams as outliers against a corpus of normal streams. This principled outlier technology has emerged as a powerful unifying technique; it identifies radio frequency interference in astronomical data and brain injury from MEG data. The underpinning theoretical contributions span a range from abstract algebra and non-commutative analysis to questions of organisation of efficient numerical calculation. See www.datasig.ac.uk/. New hyperbolic partial differential equations have been developed that compute the “signature kernel” trick without ever having to introduce signatures. Neural controlled differential equations can directly harness approaches such as the log ode method and consume the control as a rough path. The current step is the rough transformer. For this RoughPy needs to be on the GPU.*

**Professor Terry Lyons **is Wallis Professor Emeritus and Professor of Mathematics at the University of Oxford. He is currently PI of the DataSig program (primarily funded by EPSRC), and of the complementary research programme CIMDA-Oxford (under the support of InnoHK and the HKSAR). He was a founding member (2007) of, and then Director (2011-2015) of, the Oxford Man Institute of Quantitative Finance; he was the Director of the Wales Ins Institute of Mathematical and Computational Sciences (WIMCS; 2008-2011). He came to Oxford in 2000 having previously been Professor of Mathematics at Imperial College London (1993-2000), and before that he held the Colin Maclaurin Chair at Edinburgh (1985-93). He was also President of the London Mathematical Society (2013-15). Professor Lyons’s long-term research interests are focused on the mathematics of streamed data and building strong applications from these mathematical insights. His current goal is to use rough path theory to develop innovative and truly generic tools for working with streamed data and make these widely accessible through the python package RoughPy.

**On 26th February 2024** the talk would be given by our very own Dr. Emma Horton from the Statistics department of University of Warwick.

**Title:** Monte Carlo methods for branching processes

**Abstract:** Branching processes naturally arise as pertinent models in a variety of situations such as cell division, population dynamics and nuclear fission. For a wide class of branching processes, it is common that their first moment exhibits a Perron Frobenius-type decomposition. That is, the first order asymptotic behaviour is described by a triple , where is the leading eigenvalue of the system and and are the corresponding right eigenfunction and left eigenmeasure respectively. Thus, obtaining good estimates of these quantities is imperative for understanding the long-time behaviour of these processes. In this talk, we discuss various Monte Carlo methods for estimating this triple.

This talk is based on joint work with Alex Cox (University of Bath) and Denis Villemonais (Université de Lorraine).

**On 4th March 2024**we would have Professor Steve MacEachern from the department of Statistics of the Ohio State University, USA.

**Title:**Familial inference: Tests for hypotheses on a family of centers

**Abstract: **Many scientific disciplines face a replicability crisis. While these crises have many drivers, we focus on one. Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions. The most basic tests focus on the centers of the distributions. Such tests implicitly assume a specific center, e.g., the mean or the median. Yet, scientific hypotheses do not always specify a particular center. This ambiguity leaves a gap between scientific theory and statistical practice that can lead to rejection of a true null. The gap is compounded when we consider deficiencies in the formal statistical model. Rather than testing a single center, we propose testing a family of plausible centers, such as those induced by the Huber loss function (the Huber family). Each center in the family generates a point null hypothesis and the resulting family of hypotheses constitutes a familial null hypothesis. A Bayesian nonparametric procedure is devised to test the familial null. Implementation for the Huber family is facilitated by a novel pathwise optimization routine. Along the way, we visit the question of what it means to be the center of a distribution. Surprisingly, we have been unable to find a clear and comprehensive definition of this concept in the literature.

This is joint work with Ryan Thompson (University of New South Wales), Catherine Forbes (Monash University), and Mario Peruggia (The Ohio State University).

**On 11th March 2024** we would have Dr. Fanghui Liu from the Computer Science department of University of Warwick.

**Title:** From kernel methods to neural networks: double descent, function spaces, and learnability

**Abstract:** In this talk, I will discuss the relationship between kernel methods and (two-layer) neural networks for generalization, which aims to theoretically understand the separation from the perspective of function spaces. First, I will start with random features models (a typical two-layer neural network, also a kernel method) from under-parameterized regime to over-parameterized regime, which recovers the double descent and demonstrates the benefits behind over-parameterization. Second, I will compare kernel methods and neural networks via random features from reproducing kernel Hilbert space (RKHS) to Barron space, which leaves an open question: what is the suitable function space of neural networks?

**On 22nd April 2024** we would have Dr. Panos Toulis visiting us from the University of Chicago, Booth School of Business.

**Title:** Experimental Designs for Tax Audit Policies on Large Interfirm Networks

**Abstract:** This talk presents an ongoing large field experiment in a country of South America that randomizes tax audit notices (treatment) to firms connected through a large network of VAT transactions. While the ultimate goal is to optimize tax audit policy, the short-term goal is to estimate causal effects of tax audit notices on firm behavior. Of particular interest is to understand spillovers, that is, the response of firms that are not treated but are connected to other firms that are treated. First, I will discuss why current popular approaches to experimenting on networks are limited by the reality of inter-firm networks, such as their size, high interconnectivity and heavy-tailed degree distributions. I will then describe an approach to experimentation that leverages subtle sub-structures in the network. This approach is specifically designed to allow the application of Fisherian-style permutation tests of causal effects. These testing procedures are computationally efficient and finite-sample valid, qualities that are important for testing in a robust way the parameters of structural economic models.

**On 13th May 2024 **we would have Pofessor Patrick Rebeschini visiting us from the University of Oxford.

**Title:** Designing and Learning Algorithmic Regularisers.

**Abstract:** A major challenge in statistical learning involves developing models that can effectively leverage the structure of a problem and generalise well. Achieving this goal critically depends on the precise selection and application of suitable regularisation methods. Although there have been significant advancements in understanding explicit regularisation techniques, such as Lasso and nuclear norm minimisation, which have profoundly impacted the field in recent years, the development of regularisation approaches—especially those that are implicit or algorithmic—remains a difficult task. In this talk, we will address this challenge by exploring mirror descent, a family of first-order methods that generalise gradient descent. Using tools from probability and optimisation, we will introduce a structured framework for designing mirror maps and Bregman divergences. This framework enables mirror descent to attain optimal statistical rates in some settings in linear and kernel regression. If time permits, we will also briefly showcase the application of mirror descent in reinforcement learning, specifically focusing on the use of neural networks to learn mirror maps for policy optimisation.

**On 20th May 2024 **we would have Professor Fabrizio Leisen visiting us from the King's College London.

**Title:** A probabilistic view on predictive constructions for Bayesian learning.

**Abstract:** Recently, there has been interest in alternative ways to define a Bayesian model. For instance, the RSS read paper of Fong, Holmes and Walker (2023) propose a novel framework for Bayesian inference where models are specified through the assignment of a class of predictive distributions. In this talk we discuss assigning a model through a predictive approach. First, we will consider the scenario where the predictive distributions lead to an exchangeable sequence. Later we will consider predictive distributions that go beyond the exchangeable setting. Some recent results will be illustrated.

**References**

Patrizia Berti, Emanuela Dreassi, Fabrizio Leisen, Pietro Rigo and Luca Pratelli, A probabilistic view on predictive constructions for Bayesian learning. Statistical Science, Forthcoming.

Patrizia Berti, Emanuela Dreassi, Fabrizio Leisen, Pietro Rigo and Luca Pratelli. Bayesian predictive inference without a prior. Statistica Sinica, 33, 2405-2429, 2023.

Samule Garelli, Fabrizio Leisen, Pietro Rigo and Luca Pratelli. Asymptotics of predictive distributions driven by sample means and variances. arXiv:2403.16828

**On 3rd June 2024 **we would have our very own Dr. Sam Olesker-Taylor from the Statistics department of University of Warwick.

**Title: **An Analysis of Elo Rating Systems via Markov Chains: MCMC Convergence and Tournament Design

**Abstract:** TBD

**On 10th June 2024 **we would have Professor Fabrice Rossi visiting us from Université Paris Dauphine.

Title: TBD

Abstract: TBD

**On 17th June 2024** we would have Dr. Francis Bach visiting us from the Departement d'Informatique de l'Ecole Normale Superieure, PSL Research University, Centre de Recherche INRIA de Paris.

**Title:** TBD

**Abstract:** TBD.