CRiSM past seminars 2021
CRiSM seminars 2020/21 were organised by Tom Berrett.
The following talks were scheduled for the academic year 2020/2021:
Term 1

Wednesday October 28  François Caron  University of Oxford

Thursday November 12  Stefan Wager  Stanford University

Thursday November 26 (10am)  Wai Kin Wong  Hong Kong Observatory

Thursday 10 December  Sofia Olhede  EPFL Lausanne
Term 2 (on Wednesdays)

January 20  Pragya Sur  Harvard University

February 3  Rina Foygel Barber  University of Chicago

February 17  Alexandra Carpentier  OttovonGuerickeUniversität Magdeburg (cancelled)

March 3  Liza Levina  University of Michigan
 March 17  Peter Bühlmann  ETH Zürich
Term 3

May 5  Robin Ryder  CEREMADE, Université ParisDauphine
 May 19  Adeline Fermanian (3pm)  LPSM, Sorbonne Université

June 2  Zhou Fan  Yale University

Thursday June 17  Stephen Bates  UC Berkeley
 June 30  Ciara PikeBurke  Imperial College London

October 28: François Caron (University of Oxford)
Title: Nonexchangeable random partition models for microclustering
Abstract: Many popular random partition models, such as the Chinese restaurant process and its twoparameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in modelbased clustering, population genetics, ecology or network analysis. While the exchangeability assumption is sensible in many cases, it has some strong implications. In particular, Kingman’s representation theorem implies that the size of the clusters necessarily grows linearly with the sample size; this feature may be undesirable for some applications, as recently pointed out by Miller et al. (2015). We present here a flexible class of nonexchangeable random partition models which are able to generate partitions whose cluster sizes grow sublinearly with the sample size, and where the growth rate is controlled by one parameter. Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a powerlaw behaviour, controlled by another parameter. The construction is based on completely random measures and a Poisson embedding of the random partition, and inference is performed using a Sequential Monte Carlo algorithm. Additionally, we show how the model can also be directly used, by relaxing the exchangeability assumption in edgeexchangeable models, to obtain a class of sparse multigraphs with powerlaw degree distribution and sublinear growth of the node degrees. Finally, experiments on real datasets emphasize the usefulness of the approach compared to a twoparameter Chinese restaurant process.
Joint work with Giuseppe di Benedetto and Yee Whye Teh
November 12: Stefan Wager (Stanford University)
Title: Noiseinduced randomization in regression discontinuity designs
Abstract: Regression discontinuity designs are used to estimate causal effects in settings where treatment is determined by whether an observed running variable crosses a prespecified threshold. While the resulting sampling design is sometimes described as akin to a locally randomized experiment in a neighborhood of the threshold, standard formal analyses do not make reference to probabilistic treatment assignment and instead identify treatment effects via continuity arguments. Here we propose a new approach to identification, estimation, and inference in regression discontinuity designs that exploits measurement error in the running variable. Under an assumption that the measurement error is exogenous, we show how to consistently estimate causal effects using a class of linear estimators that weight treated and control units so as to balance a latent variable of which the running variable is a noisy measure. We find this approach to facilitate identification of both familiar estimands from the literature, as well as policyrelevant estimands that correspond to the effects of realistic changes to the existing treatment assignment rule. We demonstrate the method with a study of retention of HIV patients and evaluate its performance using simulated data and a regression discontinuity design artificially constructed from test scores in early childhood.
https://arxiv.org/abs/2004.09458

November 26 at 10am: Wai Kin Wong (Hong Kong Observatory)
Title: Machine learning in rainfall nowcasting.
Abstract: Rainfall nowcasting refers to the prediction of precipitation in very high spatial and temporal resolutions for the next 16 hours. Timely and quality rainfall nowcast provides indispensable source of information in support of rainstorm monitoring, alerting or warning systems that are invaluable to weather services, and disaster risk reduction of highimpact weather or rainstorms for protecting people's lives. In Hong Kong Observatory (HKO), artificial intelligence technique based on image processing algorithms have been utilized in the inhouse nowcasting system, namely SWIRLS (Shortrange Warning of Intense Rainstorms in Localised Systems) to track the motion of precipitation systems detected by weather radars, followed by predicting their future location and rainfall using the motion field. However, the intensity is assumed to remain unchanged in computation that results in decreasing skill of precipitation forecast beyond one or two hours. In recent years, novel deep learning (DL) based methods have been developed for precipitation nowcasting that have shown improved performance compared to the above operational algorithm. In this talk, the current progress of DL based methods for precipitation nowcasting will be introduced, including mathematical formulation of precipitation nowcasting as a spatiotemporal sequence forecasting problem, and a couple of general learning strategies. Performance of DL based nowcasting model and a systematic benchmark for performance evaluation will be presented. Finally, future research directions on development of DL in precipitation nowcasting and meteorological forecasting applications are discussed.

December 10: Sofia Olhede (EPFL Lausanne)
Title: Modeling networks and network populations via graph distances
Abstract: Networks have become a key form of data. Networks allow us to dependence between nodes or actors. Understanding the difference between two networks is also challenging unless they share nodes and are of the same size. We shall discuss how we may compare networks and also consider the regime where more than one network is observed.
We shall also discuss how to parametrize a distribution on labelled graphs in terms of a Frechét mean graph (which depends on a userspecified choice of metric or graph distance) and a parameter that controls the concentration of this distribution about its mean. Entropy is the natural parameter for such control, varying from a point mass concentrated on the Frechét mean itself to a uniform distribution over all graphs on a given vertex set.
Networks present many new statistical challenges. We shall discuss how to resolve these challenges respecting the fundamental nonEuclidean nature of network observations.
This is joint work with Simon Lunagomez (Lancaster University) and Patrick Wolfe (Purdue University).

January 20: Pragya Sur (Harvard University)
Title: A precise highdimensional asymptotic theory for AdaBoost
Abstract: This talk will introduce a precise highdimensional asymptotic theory for AdaBoost on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is separable in an asymptotic sense. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. At the heart of our theory lies an indepth study of the maximum L1margin, which can be accurately described by a new system of nonlinear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Time permitting, I will present a new class of boosting algorithms that correspond to Lq geometry, for q>1, together with results on their highdimensional generalization and optimization behavior. This is based on joint work with Tengyuan Liang.

February 3: Rina Foygel Barber (University of Chicago)
Title: Testing goodnessoffit and conditional independence with approximate cosufficient sampling
Abstract: Goodnessoffit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is cosufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an informationtheoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptoticallyefficient estimator; we call our extension “approximate CSS” (aCSS) testing. We quantify the finitesample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finitesample Type I error and power.

February 17: Alexandra Carpentier (OttovonGuerickeUniversität Magdeburg)
Title: Several structured thresholding bandit problems
Abstract: In this talk we will discuss the thresholding bandit problem, i.e. a sequential learning setting where the learner samples sequentially K unknown distributions for T times, and aims at outputting at the end the set of distributions whose means \mu_k are above a threshold \tau. We will study this problem under four structural assumptions, i.e. shape constraints: that the sequence of means is monotone, unimodal, concave, or unstructured (vanilla case). We will provide in each case minimax results on the performance of any strategies, as well as matching algorithms. This will highlight the fact that even more than in batch learning, structural assumptions have a huge impact in sequential learning. This work is based on a joint work with James Cheshire and Pierre Menard (http://proceedings.mlr.press/v125/cheshire20a.html), and with Andrea Locatelli and Maurilio Gutzeit (http://proceedings.mlr.press/v48/locatelli16.html).

March 3: Liza Levina (University of Michigan)
Title: Hierarchical community detection by recursive partitioning
Abstract: Community detection in networks has been extensively studied in the form of finding a single partition into a “correct” number of communities. In large networks, however, a multiscale hierarchy of communities is much more realistic. We show that a hierarchical tree of communities, obviously more interpretable, is also potentially more accurate and more computationally efficient. We construct this tree with a simple topdown recursive algorithm, at each step splitting the nodes into two communities with a noniterative spectral algorithm, until a stopping rule suggests there are no more communities. The algorithm is modelfree, extremely fast, and requires no tuning other than selecting a stopping rule. We propose a natural model for this setting, a binary tree stochastic block model, and prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. As a byproduct, we obtain explicit and intuitive results for fitting the stochastic block model under model misspecification. We illustrate the algorithm on a statistics papers dataset constructing a highly interpretable tree of statistics research communities, and on a network based on gene cooccurrence in research papers on anemia.

March 17: Peter Bühlmann (ETH Zürich)
Title: Deconfounding
Abstract: Hidden confounding is a severe problem when interpreting regression or causal parameters, and it may also lead to poor generalisation performance for prediction. Adjusting for unobserved confounding is important but challenging when based on observational data only. We propose spectral deconfounding, a class of linear data transformations, followed by standard sparse estimation methods such as the Lasso, or the Debiased Lasso when confidence guarantees are required. The proposed methodology has provable (optimality) properties when assuming dense confounding. Without additional assumptions, deconfounding from observational data is impossible. But we argue that even when such assumptions fail to hold, certain methods exhibit partial robustness against hidden confounding.
The talk is based on joint work with Domagoj Cevid, Zijian Guo and Nicolai Meinshausen.

May 5: Robin Ryder (CEREMADE, Université ParisDauphine)
Title: A Bayesian nonparametric methodology for inferring grammar complexity
Abstract: Based on a set of strings from a language, we wish to infer the complexity of the underlying grammar. To this end, we develop a methodology to choose between two classes of formal grammars in the Chomsky hierarchy: simple regular grammars and more complex contextfree grammars. To do so, we introduce a probabilistic contextfree grammar model in the form of a Hierarchical Dirichlet Process over rules expressed in Greibach Normal Form. In comparison to other representations, this has the advantage of nesting the regular class within the contextfree class. We give some theoretical properties of this new stochastic process. We consider model comparison both by exploiting this nesting, and with Bayes’ factors. The model is fit using a Sequential Monte Carlo method, implemented in the Birch probabilistic programming language. We apply this methodology to data collected from primates, for which the complexity of the grammar is a key question.
Coauthors: Lawrence Murray, Judith Rousseau and Achille Thin

May 19: Adeline Fermanian (LPSM, Sorbonne Université)
Title: Learning from data streams with the signature transform
Abstract: Data streams, also known as sequential or functional data, arise in many fields of research, such as quantitative finance, medicine or computer vision. We will be concerned with a novel approach for learning with such data, called the signature transform, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. After a general overview of signatures in machine learning, we will focus on one specific problem. In order to combine signatures with machine learning algorithms, it is necessary to truncate these infinite series. Therefore, we define an estimator of the truncation order and provide theoretical guarantees in a linear regression setting. We conclude by showing on both simulated and realworld data the advantage of a signaturebased regression model compared to traditional functional models when the data is highly dimensional.

June 2: Zhou Fan (Yale University)
Title: Empirical Bayes and Approximate Message Passing algorithms for PCA in high dimensions
Abstract: This talk will have two parts. In the first part, I will describe a new empirical Bayes procedure for principal components analysis in high dimensions, which aims to learn a prior distribution for the PCs from the observed data. Its ideas are based around the KieferWolfowitz NPMLE, asymptotic random matrix theory, and Approximate Message Passing (AMP) algorithms for Bayesian inference. I will explain the interplay between these ideas and demonstrate the method on several genetics examples. In the second part, motivated by this application, I will describe a general extension of AMP algorithms to correlated noise, where the bias correction and state evolution in AMP are replaced by forms involving the free cumulants of the spectral law.
This is joint work with Xinyi Zhong and Chang Su.

June 17: Stephen Bates (UC Berkeley)
Title: DistributionFree, RiskControlling Prediction Sets
Abstract: To enable valid statistical inference in prediction tasks, we show how to generate setvalued predictions with blackbox models that control various notions of statistical error. Our approach guarantees that the expected loss on future test points falls below a userspecified level, for any predictive model and underlying distribution. Building on conformal prediction, we use a holdout set to calibrate the size of the prediction sets, generalizing the approach to control error notions such as the false rejection rate. We demonstrate our procedure in four largescale problems: (1) multilabel classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.

June 30: Ciara PikeBurke (Imperial College London)
Title: Multiarmed bandit problems with history dependent rewards
Abstract: The multiarmed bandit problem is a common sequential decision making framework where at each time step a player selects an action and receives some reward from selecting that action. The aim is to select actions to maximize the total reward. Commonly it is assumed that the (expected) reward of each action is unknown, but is constant and does not depend on the actions that the player has previously taken. However, in many practical settings this is not realistic. For example in webadvertising, the benefit from showing an advert is likely to depend on the number of times the user has seen it in the past, and in product recommendation the reward of suggesting an item will depend on the time since it was last suggested to the customer. In this talk we will consider several variants of the multiarmed bandit problem where the reward depends on the history of the players actions. For each problem, we will discuss whether learning is possible, and if so provide algorithms that perform well theoretically.