Statistical Learning & Inference Seminars
The seminars will take place every Tuesdays 11am.
Term 1, 24-25
Date, Time and Room |
Speaker |
Title |
|||||||||||
01/10, 11am, MB2.22 |
Dr. Cornelius Fritz (Trinity College Dublin) | A Regression Framework for Studying Relationships among Attributes under Network Interference | |||||||||||
Abstract: |
To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021. | ||||||||||||
08/10, 11am, MB2.22 |
Low-rank models for dynamic multiplex graphs and multivariate time series |
||||||||||||
Abstract: |
This talk discusses low-rank models for two different types of data structures: dynamic multiplex graphs and panels of multivariate time series. The first part of the talk will present a doubly unfolded adjacency spectral embedding (DUASE) method for networks evolving over time, with different edge types, commonly known as multiplex networks. Statistical properties of DUASE will be discussed, and links with commonly used statistical models for clustering graphs will be presented. The second part of the talk will cover the case of a panel of multivariate time series where there is co-movement between the panel components, modelled via a vector autoregressive process. A Network Informed Restricted Vector Auto-Regressive (NIRVAR) process is proposed, with an algorithm that gives a low dimensional latent embedding of each component of the panel. Clustering in this latent space is then used to recover the non-zero entries of the VAR coefficient matrix. The proposed model outperforms alternative approaches in terms of prediction and inference in simulation studies and real-data examples in applications in finance and transportation systems. |
||||||||||||
15/10, 11am, MB2.22 |
Dr. Tom Rainforth (Oxford) | Modern Bayesian Experimental Design | |||||||||||
Abstract: |
Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this talk, I will outline how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively, before discussing some key areas for future development in the field. Related review paper: https://arxiv.org/abs/2302.14545 | ||||||||||||
22/10, 11am, MB2.22 |
Dr. Robin Mitra (UCL) | Using saturated count models for user-friendly synthesis of categorical data | |||||||||||
Abstract: |
Synthetic data methods are being increasingly used to protect data confidentiality. Large sparse categorical data sets pose some significant challenges for synthesis which makes many traditional methods unsuitable. We explore using saturated count models for synthesis. These are appealing as they allow large categorical data sets to be synthesized quickly and conveniently, as well as permitting risk and utility metrics to be satisfied a priori, that is, prior to synthetic data generation. Most well-known count models for synthesizing categorical data at the tabular level tend to utilise either Poisson or Poisson-mixture distributions. However, the latter are always over-dispersed, with a variance that is an increasing function of the mean. As a result, relatively more noise is applied to larger counts than smaller counts. But this is contrary to the objective of data synthesis, where larger counts are typically lower risk than smaller counts, and therefore require less perturbation. We thus additionally explore the benefits of using the discretized gamma family distribution (DGAF) for synthesis within the saturated model framework. The DGAF provides the synthesizer with control of the variance-mean relationship, allowing smaller counts to be over-dispersed and larger counts to be under-dispersed, which in turn produces synthetic data with greater utility. The benefits of the DGAF are illustrated empirically using a database which can be viewed as a good substitute to the English School Census. | ||||||||||||
29/10, 11am, MB2.22 |
Dr. Song Liu (Bristol) | High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching | |||||||||||
Abstract: |
This paper addresses differential inference in time-varying parametric probabilistic models, like graphical models with changing structures. Instead of estimating a high-dimensional model at each time and inferring changes later, we directly learn the differential parameter, i.e., the time derivative of the parameter. The main idea is treating the time score function of an exponential family model as a linear model of the differential parameter for direct estimation. We use time score matching to estimate parameter derivatives. We prove the consistency of a regularized score matching objective and demonstrate the finite-sample normality of a debiased estimator in high-dimensional settings. Our methodology effectively infers differential structures in high-dimensional graphical models, verified on simulated and real-world datasets. | ||||||||||||
05/11, 11am, MB2.22 |
Detecting Changes in Production Frontier | ||||||||||||
Abstract: |
In this talk, we first give a brief review of the nonparametric estimation problem of production frontier function, which concerns the maximum possible output given input levels and the efficiency of the firms. We then look at how (potentially) multiple changes over time in the production frontier can be detected. By assuming that the frontier always shifts upwards over time, which is plausible thanks to the advance in technologies, we can detect changes in the frontier at the near-optimal rate under regularity conditions, irrelevant of the dimensionality of the input. This can be achieved by modifying and utilising the well-known Free Disposal Hull (FDH) or Data Envelopment Analysis (DEA) algorithm in different ways, depending on whether the shift is global or local. Finally, we also discuss how the confidence intervals can be constructed in this setup. |
||||||||||||
12/11, 11am, MB2.22 |
Dr. Ed Cohen (Imperial College London)Link opens in a new window | Analysing spatial point patterns on the surface of 3D shapes | |||||||||||
Abstract: |
Statistical methodology for analysing spatial point patterns has traditionally focused on Euclidean data and planar surfaces. However, with recent advances in 3D biological imaging technologies targeting protein molecules on a cell’s plasma membrane, spatial point patterns are now being observed on complex shapes and manifolds whose geometry must be respected for principled inference. Consequently, there is now a demand for tools that can analyse these data for important scientific studies in cellular and micro-biology. Motivated by studying the spatial distribution of LPS proteins on the surface of E-Coli, we develop the fundamental functional summary statistics for the analysis of point patterns to general convex bounded shapes and demonstrate how they can be used to test for complete spatial randomness. We then develop their multi-type extensions, together with a test for independence of the component marginal processes. To support these methods, we introduce a plug-in estimator for the intensity of a spatial point process on a manifold. We conclude with a discussion on how these methods can readily be extended to a class of non-convex shapes.
References: S. Ward, E.A.K. Cohen, N. M. Adams. Testing for complete spatial randomness on 3-dimensional bounded convex shapes. Spatial Statistics, Vol. 41, 2021. S. Ward, H. S. Battey and E. A. K. Cohen. Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold. Biometrika, Vol. 110, 2023. S. Ward, E. A. K. Cohen and N. M. Adams. Functional summary statistics and testing for independence in marked point patterns on the surface of three-dimensional convex shapes. arXiv:2410.01063, 2024. |
||||||||||||
19/11, 11am, MB2.22 |
Prof. Thomas Nichols (Oxford) | Scalable Longitudinal Models for Neuroimaging Data | |||||||||||
Abstract: |
Neuroimaging has mainly depended on crossectional data to study the brain through the lifespan, but these studies can only attribute variation to intersubject differences in age. Only longitudinal neuroimaging studies can infer on age-induced changes in the brain, crucial for studies of brain developmental and aging. I will review a range of methods my group has developed for the analysis of longitudinal neuroimaging data, work started when I was Warwick faculty 10+ years ago and continued recently with collaborations with existing Warwick Stats faculty. First I'll review a very fast and practical approach using marginal models and robust standard errors, for which we have created a user-friendly implementation (SwE). Next, while we'd ideally use standard linear mixed effects (LME) implementations (e.g. lme4/nlme), they aren't practical with brain data as they can only fit data one voxel's at a time and thus can exploit vectorised computation. We have developed a highly optimised linear mixed effects (LME) implementation that exploits vectorised computation so that all voxels are simultaneously updated at each iteration (BigLMM). Finally, for longitudinal binary images, e.g. lesion masks of white matter hyperintensities, we propose a relative-risk regression to support user's preference for relative risk (RR) units instead of odds-ratios. We use a GEE approach with log-link and identity variance function and unknown dispersion parameter along with including a penalty to avoid infinite parameter estimates (joint work with Ioannis Kosmidis). This suite of work is a small indication of the rich methodological opportunities for the growing body of longitudinal neuroimaging studies. | ||||||||||||
26/11, 11am, MB2.22 |
Prof. Qiwei Yao (LSE) | Identification and Estimation for Matrix Time Series CP-factor Models | |||||||||||
Abstract: |
We investigate the identification and the estimation for matrix time series CP-factor models. Unlike the generalized eigen analysis-based method of Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the newly proposed estimation can handle rank-deficient factor loading matrices. The estimation procedure consists of the spectral decomposition of several matrices and a matrix joint diagonalization algorithm, resulting in low computational cost. The theoretical guarantee established without the stationarity assumption shows that the proposed estimation exhibits a faster convergence rate than that of Chang et al. (2023). In fact the new estimator is free from the adverse impact of any eigen-gaps, unlike most eigenanalysis-based methods. Furthermore, in terms of the error rates of the estimation, the proposed procedure is equivalent to handling a vector time series of dimension max(p, q) instead of pxq, where (p, q) are the dimensions of the matrix time series concerned. We have achieved this without assuming the “near orthogonality” of the loadings under various incoherence conditions often imposed in the CP-decomposition literature. Illustration with both simulated and real matrix time series data shows the usefulness of the proposed approach.
Joint work with Jianyuan Chang, Yue Du and Guanglin Huang
|
||||||||||||
03/12, 11am, MB2.22 |
Prof. Karthik Bharath (Nottingham) |
Rolled Gaussian process models for curves on manifolds |
|||||||||||
Abstract: |
Curves on manifolds arise as data in various applications, but practical probabilistic models for their analysis are presently unavailable. One strategy is to flatten or linearise the manifold $M$, then exploit the flattened space for modelling. But, in the absence of global coordinates on $M$ , how the flattening is done is crucial because it may induce severe distortions. A local flattening strategy based on rolling $M$ without slipping along a Euclidean curve, that is compatible with the intrinsic geometry of $M$, will be discussed. The strategy allows for the prescription of a Gaussian process-type model on $M$. Theoretical and computational challenges in estimation of and inference for parameters of the model, and their relationship to Frechet means on $M$, using discretely observed curves will be discussed, aided by an application in robot learning. | ||||||||||||
12/12 (Thursday), 11am, MB2.22 |
Dr. Minwoo Chae (Pohang University of Science and Technology) |
||||||||||||
Abstract: |
Term 3, 23-24
Date, Time and Room |
Speaker |
Title | |
30/04, 11am, MS.02 |
Oliver Feng (Bath) |
||
Abstract: |
In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal. This is joint work with Yu-Chun Kao, Min Xu and Richard Samworth. |
||
14/05, 11am, MS.01 |
Gilles Stupfler (University of Angers) |
Some new perspectives on extremal regression | |
Abstract: |
The objective of extremal regression is to estimate and infer quantities describing the tail of a conditional distribution. Examples of such quantities include quantiles and expectiles, and the regression version of the Expected Shortfall. Traditional regression estimators at the tails typically suffer from instability and inconsistency due to data sparseness, especially when the underlying conditional distributions are heavy-tailed. Existing approaches to extremal regression in the heavy-tailed case fall into two main categories: linear quantile regression approaches and, at the opposite, nonparametric approaches. They are also typically restricted to i.i.d. data-generating processes. I will here give an overview of a recent series of papers that discuss extremal regression methods in location-scale regression models (containing linear regression quantile models) and nonparametric regression models. Some key novel results include a general toolbox for extreme value estimation in the presence of random errors and joint asymptotic normality results for nonparametric extreme conditional quantile estimators constructed upon strongly mixing data. Joint work with A. Daouia, S. Girard, M. Oesting and A. Usseglio-Carleve. | ||
04/06, 11am, MB0.07 |
Rebecca Lewis (Oxford) |
High-dimensional logistic regression with separated data | |
Abstract: |
In a logistic regression model with separated data, the log-likelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient always produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. In a high-dimensional regime, we consider the implications of enforcing a natural constraint on the vector of logistic-transformed probabilities. We derive a consistent estimator of the unknown logistic regression parameter that exists even when the data are separable. | ||
18/06, 1pm, MB0.07 |
Jenny Wadsworth (Lancaster) |
||
Abstract: |
A geometric representation for multivariate extremes, based on the shapes of scaled sample clouds in light-tailed margins and their so-called limit sets, has recently been shown to connect several existing extremal dependence concepts. However, these results are purely probabilistic, and the geometric approach itself has not been fully exploited for statistical inference. We outline a method for parametric estimation of the limit set shape, which includes a useful non-/semi-parametric estimate as a pre-processing step. More fundamentally, our approach provides a new class of asymptotically motivated statistical models for the tails of multivariate distributions, and such models can accommodate any combination of simultaneous or non-simultaneous extremes through appropriate parametric forms for the limit set shape. In this talk we will also present ongoing work moving towards semiparametric methodology for greater flexibility. Extrapolation further into the tail of the distribution is possible via simulation from the fitted model, and probability estimates are possible in regions where other frameworks struggle. Joint work with Ryan Campbell. | ||
25/06, 11am, MB0.07 | Nicola Gnecco (UCL) | Extremal Random Forests | |
Abstract: |
Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data. |