Events
Thu 17 Jan, '19- |
CRiSM SeminarMSB2.23Prof. Galin Jones, School of Statistics, University of Minnesota (14:00-15:00) Bayesian Spatiotemporal Modeling Using Hierarchical Spatial Priors, with Applications to Functional Magnetic Resonance Imaging We propose a spatiotemporal Bayesian variable selection model for detecting activation in functional magnetic resonance imaging (fMRI) settings. Following recent research in this area, we use binary indicator variables for classifying active voxels. We assume that the spatial dependence in the images can be accommodated by applying an areal model to parcels of voxels. The use of parcellation and a spatial hierarchical prior (instead of the popular Ising prior) results in a posterior distribution amenable to exploration with an efficient Markov chain Monte Carlo (MCMC) algorithm. We study the properties of our approach by applying it to simulated data and an fMRI data set. Dr. Flavio Goncalves, Universidade Federal de Minas Gerais, Brazil (15:00-16:00). Exact Bayesian inference in spatiotemporal Cox processes driven by multivariate Gaussian processes In this talk we present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to allow for evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretisation error is involved despite the non-tractability of the likelihood function and infinite dimensionality of the problem. The method is based on a Markov chain Monte Carlo algorithm that samples from the joint posterior distribution of the parameters and latent variables of the model. The models are defined in a general and flexible way but they are amenable to direct sampling from the relevant distributions, due to careful characterisation of its components. The models also allow for the inclusion of regression covariates and/or temporal components to explain the variability of the intensity function. These components may be subject to relevant interaction with space and/or time. Real and simulated examples illustrate the methodology, followed by concluding remarks. |
|
Thu 31 Jan, '19- |
CRiSM SeminarMSB2.23Professor Paul Fearnhead, Lancaster University - 14:00-1500 Efficient Approaches to Changepoint Problems with Dependence Across Segments Changepoint detection is an increasingly important problem across a range of applications. It is most commonly encountered when analysing time-series data, where changepoints correspond to points in time where some feature of the data, for example its mean, changes abruptly. Often there are important computational constraints when analysing such data, with the number of data sequences and their lengths meaning that only very efficient methods for detecting changepoints are practically feasible. A natural way of estimating the number and location of changepoints is to minimise a cost that trades-off a measure of fit to the data with the number of changepoints fitted. There are now some efficient algorithms that can exactly solve the resulting optimisation problem, but they are only applicable in situations where there is no dependence of the mean of the data across segments. Using such methods can lead to a loss of statistical efficiency in situations where e.g. it is known that the change in mean must be positive. This talk will present a new class of efficient algorithms that can exactly minimise our cost whilst imposing certain constraints on the relationship of the mean before and after a change. These algorithms have links to recursions that are seen for discrete-state hidden Markov Models, and within sequential Monte Carlo. We demonstrate the usefulness of these algorithms on problems such as detecting spikes in calcium imaging data. Our algorithm can analyse data of length 100,000 in less than a second, and has been used by the Allen Brain Institute to analyse the spike patterns of over 60,000 neurons. (This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and Daniela Witten.) Dr. Sandipan Roy, Department of Mathematical Science, University of Bath (15:00-16:00) Network Heterogeneity and Strength of Connections Abstract: Detecting strength of connection in a network is a fundamental problem in understanding the relationship among individuals. Often it is more important to understand how strongly the two individuals are connected rather than the mere presence/absence of the edge. This paper introduces a new concept of strength of connection in a network through a nonparameteric object called “Grafield”. “Grafield” is a piece-wise constant bi-variate kernel function that compactly represents the affinity or strength of ties (or interactions) between every pair of vertices in the graph. We estimate the “Grafield” function through a spectral analysis of the Laplacian matrix followed by a hard thresholding (Gavish & Donoho, 2014) of the singular values. Our estimation methodology is valid for asymmetric directed network also. As a by product we get an efficient procedure for edge probability matrix estimation as well. We validate our proposed approach with several synthetic experiments and compare with existing algorithms for edge probability matrix estimation. We also apply our proposed approach to three real datasets- understanding the strength of connection in (a) a social messaging network, (b) a network of political parties in US senate and (c) a neural network of neurons and synapses in C. elegans, a type of worm. |
|
Thu 14 Feb, '19- |
CRiSM SeminarMSB2.23Philipp Hermann, Institute of Applied Statistics, Johannes Kepler University Linz, Austria Time: 14:00-15:00 LDJump: Estimating Variable Recombination Rates from Population Genetic Data Recombination is a process during meiosis which starts with the formation of DNA double-strand breaks and results in an exchange of genetic material between homologous chromosomes. In many species, recombination is concentrated in narrow regions known as hotspots, flanked by large zones with low recombination. As recombination plays an important role in evolution, its estimation and the identification of hotspot positions is of considerable interest. In this talk we introduce LDJump, our method to estimate local population recombination rates with relevant summary statistics as explanatory variables in a regression model. More precisely, we divide the DNA sequence into small segments and estimate the recombination rate per segment via the regression model. In order to obtain change-points in recombination we apply a frequentist segmentation method. This approach controls a type I error and provides confidence bands for the estimator. Overall LDJump identifies hotspots at high accuracy under different levels of genetic diversity as well as demography and is computationally fast even for genomic regions spanning many megabases. We will present a practical application of LDJump on a region of the human chromosome 21 and compare our estimated population recombination rates with experimentally measured recombination events. (joint work with Andreas Futschik, Irene Tiemann-Boege, and Angelika Heissl) Professor Dr. Ingo Scholtes, Data Analytics Group, University of Zürich Time: 15:00-16:00 Optimal Higher-Order Network Analytics for Time Series Data Network-based data analysis techniques such as graph mining, social network analysis, link prediction and clustering are an important foundation for data science applications in computer science, computational social science, economics and bioinformatics. They help us to detect patterns in large corpora of data that capture relations between genes, brain regions, species, humans, documents, or financial institutions. While this potential of the network perspective is undisputed, advances in data sensing and collection increasingly provide us with high-dimensional, temporal, and noisy data on real systems. The complex characteristics of such data sources pose fundamental challenges for network analytics. They question the validity of network abstractions of complex systems and pose a threat for interdisciplinary applications of data analytics and machine learning. To address these challenges, I introduce a graphical modelling framework that accounts for the complex characteristics of real-world data on complex systems. I demonstrate this approach in time series data on technical, biological, and social systems. Current methods to analyze the topology of such systems discard information on the timing and ordering of interactions, which however determines which elements of a system can influence each other via paths. To solve this issue, I introduce a modelling framework that (i) generalises standard network representations towards multi-order graphical models for causal paths, and (ii) uses statistical learning to achieve an optimal balance between explanatory power and model complexity. The framework advances the theoretical foundation of data science and sheds light on the important question when network representations of time series data are justified. It is the basis for a new generation of data analytics and machine learning techniques that account both for temporal and topological characteristics in real-world data. |
|
Thu 28 Feb, '19- |
CRiSM SeminarMSB2.23Prof. Isham Valerie, Statistical Science, University College London, UK (15:00-16:00) Stochastic Epidemic Models: Approximations, structured populations and networks |
|
Thu 14 Mar, '19- |
CRiSM SeminarA1.01Speaker: Spencer Wheatley, ETH Zurich, Switzerland Title: The "endo-exo" problem in financial market price fluctuations, & the ARMA point process The "endo-exo" problem -- i.e., decomposing system activity into exogenous and endogenous parts -- lies at the heart of statistical identification in many fields of science. E.g., consider the problem of determining if an earthquake is a mainshock or aftershock, or if a surge in the popularity of a youtube video is because it is "going viral", or simply due to high activity across the platform. Solution of this problem is often plagued by spurious inference (namely false strong interaction) due to neglect of trends, shocks and shifts in the data. The predominant point process model for endo-exo analysis in the field of quantitative finance is the Hawkes process. A comparison of this field with the relatively mature fields of econometrics and time series identifies the need to more rigorously control for trends and shocks. Doing so allows us to test the hypothesis that the market is "critical" -- analogous to a unit root test commonly done in economic time series -- and challenge earlier results. Continuing "lessons learned" from the time series field, it is argued that the Hawkes point process is analogous to integer valued AR time series. Following this analogy, we introduce the ARMA point process, which flexibly combines exo background activity (Poisson), shot-noise bursty dynamics, and self-exciting (Hawkes) endogenous activity. We illustrate a connection to ARMA time series models, as well as derive an MCEM (Monte Carlo Expectation Maximization) algorithm to enable MLE of this process, and assess consistency by simulation study. Remaining challenges in estimation and model selection as well as possible solutions are discussed.
[1] Wheatley, S., Wehrli, A., and Sornette, D. "The endo-exo problem in high frequency financial price fluctuations and rejecting criticality". To appear in Quantitative Finance (2018). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3239443 [2] Wheatley, S., Schatz, M., and Sornette, D. "The ARMA Point Process and its Estimation." arXiv preprint arXiv:1806.09948 (2018). |
|
Wed 20 Mar, '19- |
CRiSM DayMS.01 |
|
Wed 27 Mar, '19- |
CRiSM SeminarMSB2.23Daniel Rudolf, Institute for Mathematical Stochastics, Georg-August-Universität Göttingen Title: Quantitative spectral gap estimate and Wasserstein contraction of simple slice sampling Abstract: By proving Wasserstein contraction of simple slice sampling for approximate sampling of distributions determined by log-concave rotational invariant unnormalized densities we derive an explicit quantitative lower bound of the spectral gap. In particular, the lower bound of the spectral gap carries over to more general distributions depending only on the volume of the (super-)level sets of the unnormalized density. |
|
Thu 2 May, '19- |
CRiSM SeminarA1.01Speaker: Dr. Ben Calderhead, Department of Mathematics, Imperial College London Abstract: Quasi-Monte Carlo (QMC) methods for estimating integrals are attractive since the resulting estimators typically converge at a faster rate than pseudo-random Monte Carlo. However, they can be difficult to set up on arbitrary posterior densities within the Bayesian framework, in particular for inverse problems. We introduce a general parallel Markov chain Monte Carlo(MCMC) framework, for which we prove a law of large numbers and a central limit theorem. In that context, non-reversible transitions are investigated. We then extend this approach to the use of adaptive kernels and state conditions, under which ergodicity holds. As a further extension, an importance sampling estimator is derived, for which asymptotic unbiasedness is proven. We consider the use of completely uniformly distributed (CUD) numbers within the above mentioned algorithms, which leads to a general parallel quasi-MCMC (QMCMC) methodology. We prove consistency of the resulting estimators and demonstrate numerically that this approach scales close to n^{-2} as we increase parallelisation, instead of the usual n^{-1} that is typical of standard MCMC algorithms. In practical statistical models we observe multiple orders of magnitude improvement compared with pseudo-random methods. |
|
Mon 13 May, '19- |
CRiSM SeminarMB0.07Prof. Renauld Lambiote, University of Oxford, UK (15:00-16:00) Higher-Order Networks: Network science provides powerful analytical and computational methods to describe the behaviour of complex systems. From a networks viewpoint, the system is seen as a collection of elements interacting through pairwise connections. Canonical examples include social networks, neuronal networks or the Web. Importantly, elements often interact directly with a relatively small number of other elements, while they may influence large parts of the system indirectly via chains of direct interactions. In other words, networks allow for a sparse architecture together with global connectivity. Compared with mean-field approaches, network models often have greater explanatory power because they account for the non-random topologies of real-life systems. However, new forms of high-dimensional and time-resolved data have now also shed light on the limitations of these models. In this talk, I will review recent advances in the development of higher-order network models, which account for different types of higher-order dependencies in complex data. Those include temporal networks, where the network is itself a dynamical entity and higher-order Markov models, where chains of interactions are more than a combination of links. |
|
Thu 30 May, '19- |
CRiSM SeminarA1.01Dr. Yoav Zemel, University of Göttingen, Germany (15:00-16:00) Title: Procrustes Metrics on Covariance Operators and Optimal Transportation of Gaussian Processes Abstract: Covariance operators are fundamental in functional data analysis, providing the canonical means to analyse functional variation via the celebrated Karhunen-Loève expansion. These operators may themselves be subject to variation, for instance in contexts where multiple functional populations are to be compared. Statistical techniques to analyse such variation are intimately linked with the choice of metric on covariance operators, and the intrinsic infinite-dimensionality of these operators. We describe the manifold-like geometry of the space of trace-class infinite-dimensional covariance operators and associated key statistical properties, under the recently proposed infinite-dimensional version of the Procrustes metric (Pigoli et al. Biometrika 101, 409–422, 2014). We identify this space with that of centred Gaussian processes equipped with the Wasserstein metric of optimal transportation. The identification allows us to provide a detailed description of those aspects of this manifold-like geometry that are important in terms of statistical inference; to establish key properties of the Fréchet mean of a random sample of covariances; and to define generative models that are canonical for such metrics and link with the problem of registration of warped functional data. |
|
Thu 13 Jun, '19- |
CRiSM SeminarMSB2.22Prof. Karla Hemming, University of Birmingham, UK (15:00-16:00) Speaker: Clair Barnes, University College London, UK Death & the Spider: postprocessing multi-ensemble weather forecasts with uncertainty quantification Ensemble weather forecasts often under-represent uncertainty, leading to overconfidence in their predictions. Multi-model forecasts combining several individual ensembles have been shown to display greater skill than single-ensemble forecasts in predicting temperatures, but tend to retain some bias in their joint predictions. Established postprocessing techniques are able to correct bias and calibration issues in univariate forecasts, but are generally not designed to handle multivariate forecasts (of several variables or at several locations, say) without separate specification of the structure of the inter-variable dependence. We propose a flexible multivariate Bayesian postprocessing framework, developed around a directed acyclic graph representing the relationships between the ensembles and the observed weather. The posterior forecast is inferred from the ensemble forecasts and an estimate of their shared discrepancy, which is obtained from a collection of past forecast-observation pairs. The approach is illustrated with an application to forecasts of UK surface temperatures during the winter period from 2007-2013. Speaker: Karla Hemming, University of Birmingham (1500-1600) The I-squared-CRT statistic to describe treatment effect heterogeneity in cluster randomized trials. |
|
Tue 25 Jun, '19- |
CRiSM SeminarMS.05Prof. Malgorzata Bogdan, University of Wroclaw, Poland (15:00-16:00) Abstract: Sorted L-One Penalized Estimator is a relatively new convex optimization procedure for identifying predictors in large data bases. |
|
Fri 28 Jun, '19- |
CRiSM SeminarMB2.23Dr. Pauline O'Shaughnessy, University of Wollongong, Australia Title: Bootstrap inference in the longitudinal data with multiple sources of variation Abstract: Linear mixed models allow us to model the dependence among the responses by incorporating random effects. Such dependence inherent in the longitudinal data from a complex design can be from the clustering between subjects and the repeated measurements within the subject. When the underlying distribution is not fully specified, we consider a class of estimators defined by the Gaussian quasi-likelihood for normal-like response variable. Historically it is challenging to make inference about the variance components in the framework of mixed models. We propose a new weighted estimating equation bootstrap, which varies weight schemes for different parameter estimators. The performance of the weighted estimating equation bootstrap is empirically evaluated in the simulation studies, showing improved coverage and variance estimation for the variance component estimators under models with normal and non-normal distributions for random effects. The asymptotic properties will also be addressed and we apply this new bootstrap method to a longitudinal dataset in biology. (This is a joint work with Professor Alan Welsh from the Australian National University.) |
|
Thu 24 Oct, '19- |
CRiSM SeminarMB0.07Localizing Changes in High-Dimensional Vector Autoregressive Processes |
|
Thu 7 Nov, '19- |
CRiSM Seminar: High-dimensional principal component analysis with heterogeneous missingnessMB0.07 |
|
Thu 21 Nov, '19- |
CRiSM Seminar - Modelling Networks and Network Populations via Graph DistancesMB0.07 Mathematical Sciences BuildingSpeaker: Sofia Olhede |
|
Thu 5 Dec, '19- |
CRiSM SeminarMB0.07 |
|
Wed 15 Jan, '20- |
CRiSM Seminar - Deep learning in genomics, and a topic model for single cell analysis - Gerton LunterMB0.07 |
|
Wed 29 Jan, '20- |
CRiSM Seminar - Modelling spatially correlated binary data, Professor Jianxin PanMB0.07 |
|
Wed 12 Feb, '20- |
CRiSM Seminar - Model Property-Based and Structure-Preserving ABC for complex stochastic modelsMB0.07 |
|
Wed 26 Feb, '20- |
CRiSM Seminar - Sequential learning via a combined reinforcement learning and data assimilation ansatz for decision supportMB0.07 |
|
Wed 4 Mar, '20 |
CRiSM Seminar - Scaling Optimal Transport for High dimensional LearningMB0.07 Mathematical Sciences BuildingSpeaker: Gabriel Peyré, CNRS and Ecole Normale Supérieure |
|
Thu 30 Apr, '20- |
CRiSM Seminar - Simon FrenchOnline |
|
Thu 14 May, '20- |
CRiSM Seminar - Jane Hutton: I know I don't know: Covid-19 patients' journeys through hospitalOnlineI was asked to consider the available data on Covid-19 patients' path into hospital, and then to intensive care, or death, or transfer or discharge. Of course, once in intensive care, patients can move to the states death, discharge home, discharge to nursing home, discharge to hospital ward. I was invited by those who think I know about analysis of times to events with messy data. The data is messy, and there are other challenges. I benefited from conversations with medical friends and colleagues, particularly a respiratory physician. Depending on permissions, I will either illustrate issues with artificial data, or present actual results. |
|
Thu 11 Jun, '20- |
CRiSM Seminar - Olivier RenaudOnline |
|
Thu 25 Jun, '20- |
CRiSM SeminarMB0.08 |
|
Wed 28 Oct, '20- |
CRiSM Seminarvia Teams |
|
Thu 12 Nov, '20- |
CRiSM Seminarvia Teams |
|
Thu 26 Nov, '20- |
CRiSM Seminarvia Teams |
|
Thu 10 Dec, '20- |
CRiSM Seminarvia Teams |