CRiSM past seminars 20-21

CRiSM seminars 2020/21 were organised by Tom Berrett.

The following talks were scheduled for the academic year 2020/2021:

Term 1

Wednesday October 28 - François Caron - University of Oxford
Thursday November 12 - Stefan Wager - Stanford University
Thursday November 26 (10am) - Wai Kin Wong - Hong Kong Observatory
Thursday 10 December - Sofia Olhede - EPFL Lausanne

Term 2 (on Wednesdays)

January 20 - Pragya Sur - Harvard University
February 3 - Rina Foygel Barber - University of Chicago
February 17 - Alexandra Carpentier - Otto-von-Guericke-Universität Magdeburg (cancelled)
March 3 - Liza Levina - University of Michigan
March 17 - Peter Bühlmann - ETH Zürich

Term 3

May 5 - Robin Ryder - CEREMADE, Université Paris-Dauphine
May 19 - Adeline Fermanian (3pm) - LPSM, Sorbonne Université
June 2 - Zhou Fan - Yale University
Thursday June 17 - Stephen Bates - UC Berkeley
June 30 - Ciara Pike-Burke - Imperial College London

-------------------------------------------------------------------------------------------------------------------------------

October 28: François Caron (University of Oxford)

Title: Non-exchangeable random partition models for microclustering

Abstract: Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in model-based clustering, population genetics, ecology or network analysis. While the exchangeability assumption is sensible in many cases, it has some strong implications. In particular, Kingman’s representation theorem implies that the size of the clusters necessarily grows linearly with the sample size; this feature may be undesirable for some applications, as recently pointed out by Miller et al. (2015). We present here a flexible class of non-exchangeable random partition models which are able to generate partitions whose cluster sizes grow sublinearly with the sample size, and where the growth rate is controlled by one parameter. Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a power-law behaviour, controlled by another parameter. The construction is based on completely random measures and a Poisson embedding of the random partition, and inference is performed using a Sequential Monte Carlo algorithm. Additionally, we show how the model can also be directly used, by relaxing the exchangeability assumption in edge-exchangeable models, to obtain a class of sparse multigraphs with power-law degree distribution and sublinear growth of the node degrees. Finally, experiments on real datasets emphasize the usefulness of the approach compared to a two-parameter Chinese restaurant process.

Joint work with Giuseppe di Benedetto and Yee Whye Teh

https://arxiv.org/pdf/1711.07287.pdf

-------------------------------------------------------------------------------------------------------------------------------

November 12: Stefan Wager (Stanford University)

Title: Noise-induced randomization in regression discontinuity designs

Abstract: Regression discontinuity designs are used to estimate causal effects in settings where treatment is determined by whether an observed running variable crosses a pre-specified threshold. While the resulting sampling design is sometimes described as akin to a locally randomized experiment in a neighborhood of the threshold, standard formal analyses do not make reference to probabilistic treatment assignment and instead identify treatment effects via continuity arguments. Here we propose a new approach to identification, estimation, and inference in regression discontinuity designs that exploits measurement error in the running variable. Under an assumption that the measurement error is exogenous, we show how to consistently estimate causal effects using a class of linear estimators that weight treated and control units so as to balance a latent variable of which the running variable is a noisy measure. We find this approach to facilitate identification of both familiar estimands from the literature, as well as policy-relevant estimands that correspond to the effects of realistic changes to the existing treatment assignment rule. We demonstrate the method with a study of retention of HIV patients and evaluate its performance using simulated data and a regression discontinuity design artificially constructed from test scores in early childhood.

https://arxiv.org/abs/2004.09458

-------------------------------------------------------------------------------------------------------------------------------

November 26 at 10am: Wai Kin Wong (Hong Kong Observatory)

Title: Machine learning in rainfall nowcasting.

Abstract: Rainfall nowcasting refers to the prediction of precipitation in very high spatial and temporal resolutions for the next 1-6 hours. Timely and quality rainfall nowcast provides indispensable source of information in support of rainstorm monitoring, alerting or warning systems that are invaluable to weather services, and disaster risk reduction of high-impact weather or rainstorms for protecting people's lives. In Hong Kong Observatory (HKO), artificial intelligence technique based on image processing algorithms have been utilized in the in-house nowcasting system, namely SWIRLS (Short-range Warning of Intense Rainstorms in Localised Systems) to track the motion of precipitation systems detected by weather radars, followed by predicting their future location and rainfall using the motion field. However, the intensity is assumed to remain unchanged in computation that results in decreasing skill of precipitation forecast beyond one or two hours. In recent years, novel deep learning (DL) based methods have been developed for precipitation nowcasting that have shown improved performance compared to the above operational algorithm. In this talk, the current progress of DL based methods for precipitation nowcasting will be introduced, including mathematical formulation of precipitation nowcasting as a spatiotemporal sequence forecasting problem, and a couple of general learning strategies. Performance of DL based nowcasting model and a systematic benchmark for performance evaluation will be presented. Finally, future research directions on development of DL in precipitation nowcasting and meteorological forecasting applications are discussed.

-----------------------------------------------------------------------------------------------------------------------------

December 10: Sofia Olhede (EPFL Lausanne)

Title: Modeling networks and network populations via graph distances

Abstract: Networks have become a key form of data. Networks allow us to dependence between nodes or actors. Understanding the difference between two networks is also challenging unless they share nodes and are of the same size. We shall discuss how we may compare networks and also consider the regime where more than one network is observed.

We shall also discuss how to parametrize a distribution on labelled graphs in terms of a Frechét mean graph (which depends on a user-specified choice of metric or graph distance) and a parameter that controls the concentration of this distribution about its mean. Entropy is the natural parameter for such control, varying from a point mass concentrated on the Frechét mean itself to a uniform distribution over all graphs on a given vertex set.

Networks present many new statistical challenges. We shall discuss how to resolve these challenges respecting the fundamental non-Euclidean nature of network observations.

This is joint work with Simon Lunagomez (Lancaster University) and Patrick Wolfe (Purdue University).

-----------------------------------------------------------------------------------------------------------------------------

January 20: Pragya Sur (Harvard University)

Title: A precise high-dimensional asymptotic theory for AdaBoost

Abstract: This talk will introduce a precise high-dimensional asymptotic theory for AdaBoost on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is separable in an asymptotic sense. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Time permitting, I will present a new class of boosting algorithms that correspond to Lq geometry, for q>1, together with results on their high-dimensional generalization and optimization behavior. This is based on joint work with Tengyuan Liang.

-----------------------------------------------------------------------------------------------------------------------------

February 3: Rina Foygel Barber (University of Chicago)

Title: Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling

Abstract: Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension “approximate CSS” (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.

This work is joint with Lucas Janson.

-----------------------------------------------------------------------------------------------------------------------------

February 17: Alexandra Carpentier (Otto-von-Guericke-Universität Magdeburg)

Title: Several structured thresholding bandit problems

Abstract: In this talk we will discuss the thresholding bandit problem, i.e. a sequential learning setting where the learner samples sequentially K unknown distributions for T times, and aims at outputting at the end the set of distributions whose means \mu_k are above a threshold \tau. We will study this problem under four structural assumptions, i.e. shape constraints: that the sequence of means is monotone, unimodal, concave, or unstructured (vanilla case). We will provide in each case minimax results on the performance of any strategies, as well as matching algorithms. This will highlight the fact that even more than in batch learning, structural assumptions have a huge impact in sequential learning. This work is based on a joint work with James Cheshire and Pierre Menard (http://proceedings.mlr.press/v125/cheshire20a.html), and with Andrea Locatelli and Maurilio Gutzeit (http://proceedings.mlr.press/v48/locatelli16.html).

-----------------------------------------------------------------------------------------------------------------------------

March 3: Liza Levina (University of Michigan)

Title: Hierarchical community detection by recursive partitioning

Abstract: Community detection in networks has been extensively studied in the form of finding a single partition into a “correct” number of communities. In large networks, however, a multi-scale hierarchy of communities is much more realistic. We show that a hierarchical tree of communities, obviously more interpretable, is also potentially more accurate and more computationally efficient. We construct this tree with a simple top-down recursive algorithm, at each step splitting the nodes into two communities with a non-iterative spectral algorithm, until a stopping rule suggests there are no more communities. The algorithm is model-free, extremely fast, and requires no tuning other than selecting a stopping rule. We propose a natural model for this setting, a binary tree stochastic block model, and prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. As a by-product, we obtain explicit and intuitive results for fitting the stochastic block model under model misspecification. We illustrate the algorithm on a statistics papers dataset constructing a highly interpretable tree of statistics research communities, and on a network based on gene co-occurrence in research papers on anemia.

Joint work with Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Koen van de Berge, Purnamrita Sarkar, and Peter Bickel.

-----------------------------------------------------------------------------------------------------------------------------

March 17: Peter Bühlmann (ETH Zürich)

Title: Deconfounding

Abstract: Hidden confounding is a severe problem when interpreting regression or causal parameters, and it may also lead to poor generalisation performance for prediction. Adjusting for unobserved confounding is important but challenging when based on observational data only. We propose spectral deconfounding, a class of linear data transformations, followed by standard sparse estimation methods such as the Lasso, or the Debiased Lasso when confidence guarantees are required. The proposed methodology has provable (optimality) properties when assuming dense confounding. Without additional assumptions, deconfounding from observational data is impossible. But we argue that even when such assumptions fail to hold, certain methods exhibit partial robustness against hidden confounding.

The talk is based on joint work with Domagoj Cevid, Zijian Guo and Nicolai Meinshausen.

-----------------------------------------------------------------------------------------------------------------------------

May 5: Robin Ryder (CEREMADE, Université Paris-Dauphine)

Title: A Bayesian non-parametric methodology for inferring grammar complexity

Abstract: Based on a set of strings from a language, we wish to infer the complexity of the underlying grammar. To this end, we develop a methodology to choose between two classes of formal grammars in the Chomsky hierarchy: simple regular grammars and more complex context-free grammars. To do so, we introduce a probabilistic context-free grammar model in the form of a Hierarchical Dirichlet Process over rules expressed in Greibach Normal Form. In comparison to other representations, this has the advantage of nesting the regular class within the context-free class. We give some theoretical properties of this new stochastic process. We consider model comparison both by exploiting this nesting, and with Bayes’ factors. The model is fit using a Sequential Monte Carlo method, implemented in the Birch probabilistic programming language. We apply this methodology to data collected from primates, for which the complexity of the grammar is a key question.

Co-authors: Lawrence Murray, Judith Rousseau and Achille Thin

-----------------------------------------------------------------------------------------------------------------------------

May 19: Adeline Fermanian (LPSM, Sorbonne Université)

Title: Learning from data streams with the signature transform

Abstract: Data streams, also known as sequential or functional data, arise in many fields of research, such as quantitative finance, medicine or computer vision. We will be concerned with a novel approach for learning with such data, called the signature transform, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. After a general overview of signatures in machine learning, we will focus on one specific problem. In order to combine signatures with machine learning algorithms, it is necessary to truncate these infinite series. Therefore, we define an estimator of the truncation order and provide theoretical guarantees in a linear regression setting. We conclude by showing on both simulated and real-world data the advantage of a signature-based regression model compared to traditional functional models when the data is highly dimensional.

-----------------------------------------------------------------------------------------------------------------------------

June 2: Zhou Fan (Yale University)

Title: Empirical Bayes and Approximate Message Passing algorithms for PCA in high dimensions

Abstract: This talk will have two parts. In the first part, I will describe a new empirical Bayes procedure for principal components analysis in high dimensions, which aims to learn a prior distribution for the PCs from the observed data. Its ideas are based around the Kiefer-Wolfowitz NPMLE, asymptotic random matrix theory, and Approximate Message Passing (AMP) algorithms for Bayesian inference. I will explain the interplay between these ideas and demonstrate the method on several genetics examples. In the second part, motivated by this application, I will describe a general extension of AMP algorithms to correlated noise, where the bias correction and state evolution in AMP are replaced by forms involving the free cumulants of the spectral law.

This is joint work with Xinyi Zhong and Chang Su.

-----------------------------------------------------------------------------------------------------------------------------

June 17: Stephen Bates (UC Berkeley)

Title: Distribution-Free, Risk-Controlling Prediction Sets

Abstract: To enable valid statistical inference in prediction tasks, we show how to generate set-valued predictions with black-box models that control various notions of statistical error. Our approach guarantees that the expected loss on future test points falls below a user-specified level, for any predictive model and underlying distribution. Building on conformal prediction, we use a holdout set to calibrate the size of the prediction sets, generalizing the approach to control error notions such as the false rejection rate. We demonstrate our procedure in four large-scale problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.

-----------------------------------------------------------------------------------------------------------------------------

June 30: Ciara Pike-Burke (Imperial College London)

Title: Multi-armed bandit problems with history dependent rewards

Abstract: The multi-armed bandit problem is a common sequential decision making framework where at each time step a player selects an action and receives some reward from selecting that action. The aim is to select actions to maximize the total reward. Commonly it is assumed that the (expected) reward of each action is unknown, but is constant and does not depend on the actions that the player has previously taken. However, in many practical settings this is not realistic. For example in web-advertising, the benefit from showing an advert is likely to depend on the number of times the user has seen it in the past, and in product recommendation the reward of suggesting an item will depend on the time since it was last suggested to the customer. In this talk we will consider several variants of the multi-armed bandit problem where the reward depends on the history of the players actions. For each problem, we will discuss whether learning is possible, and if so provide algorithms that perform well theoretically.