Events
Fri 20 Jan, '17- |
CRiSM SeminarMA_B1.01Yi Yu (University of Bristol) Title: Estimating whole brain dynamics using spectral clustering Abstract: The estimation of time-varying networks for functional Magnetic Resonance Imaging (fMRI) data sets is of increasing importance and interest. In this work, we formulate the problem in a high-dimensional time series framework and introduce a data-driven method, namely Network Change Points Detection (NCPD), which detects change points in the network structure of a multivariate time series, with each component of the time series represented by a node in the network. NCPD is applied to various simulated data and a resting-state fMRI data set. This new methodology also allows us to identify common functional states within and across subjects. Finally, NCPD promises to offer a deep insight into the large-scale characterisations and dynamics of the brain. This is joint work with Ivor Cribben (Alberta School of Business). |
|
Fri 3 Feb, '17- |
CRiSM SeminarMA_B1.01Liz Ryan (KCL) Title: Simulation-based Fully Bayesian Experimental Design Abstract: Bayesian experimental design is a fast growing area of research with many real-world applications. As computational power has increased over the years, so has the development of simulation-based design methods, which involve a number of Bayesian algorithms, such as Markov chain Monte Carlo (MCMC) algorithms. However, many of the proposed algorithms have been found to be computationally intensive for complex or nonstandard design problems, such as those which require a large number of design points to be found and/or those for which the observed data likelihood has no analytic expression. In this work, we develop novel extensions of existing algorithms which have been used for Bayesian experimental design, and also incorporate methodologies which have been used for Bayesian inference into the design framework, so that solutions to more complex design problems can be found. |
|
Fri 17 Feb, '17- |
CRiSM SeminarMA_B1.01Ioannis Kosmidis Title: Reduced-bias inference for regression models with tractable and intractable likelihoods Abstract: This talk focuses on a unified theoretical and algorithmic framework for reducing bias in the estimation of statistical models from a practitioners point of view. We will briefly discuss how shortcomings of classical estimators and of inferential procedures depending on those can be overcome via reduction of bias, and provide a few demonstrations stemming from current and past research on well-used statistical models with tractable likelihoods, including beta regression for bounded-domain responses, and the typically small-sample setting of meta-analysis and meta-regression in the presence of heterogeneity. The large impact that bias in the estimation of the variance components can have on inference motivates delivering higher-order corrective methods for generalised linear mixed models. The challenges in doing that will be presented along with resolutions stemming from current research. |
|
Fri 3 Mar, '17- |
CRiSM SeminarMA_B1.01Marcelo Pereyra Bayesian inference by convex optimisation: theory, methods, and algorithms. Abstract: Convex optimisation has become the main Bayesian computation methodology in many areas of data science such as mathematical imaging and machine learning, where high dimensionality is often addressed by using models that are log-concave and where maximum-a-posteriori (MAP) estimation can be performed efficiently by optimisation. The first part of this talk presents a new decision-theoretic derivation of MAP estimation and shows that, contrary to common belief, under log-concavity MAP estimators are proper Bayesian estimators. A main novelty is that the derivation is based on differential geometry. Following on from this, we establish universal theoretical guarantees for the estimation error involved and show estimation stability in high dimensions. Moreover, the second part of the talk describes a new general methodology for approximating Bayesian high-posterior-density regions in log-concave models. The approximations are derived by using recent concentration of measure results related to information theory, and can be computed very efficiently, even in large-scale problems, by using convex optimisation techniques. The approximations also have favourable theoretical properties, namely they outer-bound the true high-posterior-density credibility regions, and they are stable with respect to model dimension. The proposed methodology is finally illustrated on two high-dimensional imaging inverse problems related to tomographic reconstruction and sparse deconvolution, where they are used to explore the uncertainty about the solutions, and where convex-optimisation-empowered proximal Markov chain Monte Carlo algorithms are used as benchmark to compute exact credible regions and measure the approximation error. |
|
Fri 17 Mar, '17- |
CRiSM SeminarMA_B1.01Paul Birrell (MRC Biostatistics Unit, Cambridge) Towards Computationally Efficient Epidemic Inference
|
|
Fri 5 May, '17- |
CRiSM SeminarD1.07"Adaptive MCMC For Everyone" |
|
Fri 19 May, '17- |
CRiSM SeminarD1.07Korbinian Strimmer (Imperial) An entropy approach for integrative genomics and network modeling Multivariate regression approaches such as Seemingly Unrelated Regression (SUR) or Partial Least Squares (PLS) are commonly used in vertical data integration to jointly analyse different types of omics data measured on the same samples, such as SNP and gene expression data (eQTL) or proteomic and transcriptomic data. However, these approaches may be difficult to apply and to interpret for computational and conceptual reasons. Here we present a simple alternative approach to integrative genomics based on using relative entropy to characterise the overall association between two (or more) sets of omic data, and to infer the underlying corresponding association network among the individual covariates. This approach is computationally inexpensive and can be applied to large-dimensional data sets. A key and novel feature of our method is decomposition of the total strength between two or more groups of variables based on optimal whitening of the individual data sets. Correspondingly, it may also be viewed as a special form of a latent-variable multivariate regression model. We illustrate this approach by analysing metabolomic and transcriptomic data from the DILGOM study. References: A. Kessy, A. Lewin, and K. Strimmer. 2017. Optimal whitening and decorrelation. The American Statistician, to appear. http://dx.doi.org/10.1080/00031305.2016.1277159 T. Jendoubi and K. Strimmer. 2017. Data integration and network modeling: an entropy approach. In prep. |
|
Fri 30 Jun, '17- |
CRiSM Seminar - Paul Kirk (BSU, Cambridge) (C1.06)C1.06, Zeeman BuildingTitle: Semi-supervised multiview clustering for high-dimensional data |
|
Fri 27 Oct, '17- |
CRiSM SeminarA1.01Speaker: Davide Pigoli (King's College London) Title: Functional data analysis of biological growth processes Abstract: Functional data are examples of high-dimensional data when the observed variables have a natural ordering and are generated by an underlying smooth process. These additional properties allow us to develop methods that go beyond what would be possible with classical multivariate techniques. In this talk, I will demonstrate the potential of functional data analysis for biological growth processes in two different applications. The first one is in forensic entomology, where there is the need of estimating time-dependent growth curves from experiments where larvae have been exposed to a relatively small number of constant temperature profiles. The second one is in quantitative genetics, where the growth curve is a function-valued phenotypic trait from which the continuous genetic variation needs to be estimated. |
|
Thu 9 Nov, '17- |
CRiSM SeminarC0.08Speaker: Jonathan Keith (Monash University) Title: Markov chain Monte Carlo in discrete spaces, with applications in bioinformatics and ecology Abstract: Efficient sampling of probability distributions over large discrete spaces is a challenging problem that arises in many contexts in bioinformatics and ecology. For example, segmentation of genomes to identify putative functional elements can be cast as a multiple change-point problem involving thousands or even millions of change-points. Another example involves reconstructing the invasion history of an introduced species by embedding a phylogenetic tree in a landscape. A third example involves inferring networks of molecular interactions in cellular systems. In this talk I describe a generalisation of the Gibbs sampler that allows this well known strategy for sampling probability distributions in R^n to be adapted for sampling discrete spaces. The technique has been successfully applied to each of the problems mentioned above. However, these problems remain highly computationally intensive. I will discuss a number of alternatives for efficient sampling of such spaces, and will be seeking collaborations to develop these and other new approaches. |
|
Fri 24 Nov, '17- |
CRiSM SeminarA1.013-4pm A1.01, Nov 24, 2017 - Song LiuTitle: Trimmed Density Ratio Estimation Abstract: Density ratio estimation has become a versatile tool in machine learning community recently. However, due to its unbounded nature, density ratio estimation is vulnerable to corrupted data points, which often pushes the estimated ratio toward infinity. In this paper, we present a robust estimator which automatically identifies and trims outliers. The proposed estimator has a convex formulation, and the global optimum can be obtained via subgradient descent. We analyze the parameter estimation error of this estimator under high-dimensional settings. Experiments are conducted to verify the effectiveness of the estimator. |
|
Fri 8 Dec, '17- |
CRiSM SeminarA1.013-4pm A1.01, Dec 8, 2017 - Richard SamworthTitle: High-dimensional changepoint estimation via sparse projection Abstract: Changepoints are a very common feature of big data that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate changepoint estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint. 4-5pm A1.01, Dec 8, 2017 - Simon R. White, MRC Biostatistics Unit, University of CambridgeTitle: Spatio-temporal modelling and heterogeneity in neuroimaging Abstract: Neuroimaging allows us to gain insight into the structure and activity of the brain. Clearly, there is significant spatial structure that leads to dependencies across measurements that must be accounted for. Further, the brain as an organ is never idle, thus the local temporal behaviour is important when characterising long-term functional connectivity. In this talk we will discuss several approaches to modelling neuroimaging that account for these key features, namely spatio-temporal heterogeneity: a novel approach to spatial modelling as an extension to the commonly used dimension reduction technique independent component analysis (ICA) for tasked-based functional magnetic resonance imaging (fMRI); propagating subject-level heterogeneity through multi-stage analyses of dynamic functional connectivity (dFC) using resting-state fMRI (rs-fMRI), and structural development using structural MRI.
|
|
Fri 19 Jan, '18- |
CRiSM SeminarMA_B1.01Jonas Peters, Department of Mathematical Sciences, University of Copenhagen Invariant Causal Prediction Abstract: Why are we interested in the causal structure of a process? In classical prediction tasks as regression, for example, it seems that no causal knowledge is required. In many situations, however, we want to understand how a system reacts under interventions, e.g., in gene knock-out experiments. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction uses only direct causes of the target variable as predictors; it remains valid even if we intervene on predictor variables or change the whole experimental setting. In this talk, we show how we can exploit this invariance principle to estimate causal structure from data. We apply the methodology to data sets from biology, epidemiology, and finance. The talk does not require any knowledge about causal concepts. David Ginsbourger, Idiap Research Institute and University of Bern, http://www.ginsbourger.ch
Abstract: Gaussian Process models have been used in a number of problems where an objective function f needs to be studied based on a drastically limited number of evaluations.
Global optimization algorithms based on Gaussian Process models have been investigated for several decades, and have become quite popular notably in design of computer experiments. Also, further classes of problems involving the estimation of sets implicitly defined by f, e.g. sets of excursion above a given threshold, have inspired multiple research developments.
In this talk, we will give an overview of recent results and challenges pertaining to the estimation of sets under Gaussian Process priors, with a particular interest for to the quantification and the sequential reduction of associated uncertainties.
Based on a series of joint works primarily with Dario Azzimonti, François Bachoc, Julien Bect, Mickaël Binois, Clément Chevalier, Ilya Molchanov, Victor Picheny, Yann Richet and Emmanuel Vazquez. |
|
Fri 19 Jan, '18- |
CRiSM SeminarA1.01 |
|
Fri 2 Feb, '18- |
CRiSM SeminarMA_B1.012-3pm MA B1.01, 2 Feb, 2018 - Robin Evans - (Oxford University)Title: Geometry and statistical model selection Abstract: TBA |
|
Fri 2 Feb, '18- |
CRiSM SeminarA1.012nd Feb - 3pm - 4pm A1.01 - Azadeh Khaleghi (Lancaster Univeristy) Title: Approximations of the Restless Bandit Problem Abstract: In this talk I will discuss our recent paper on the multi-armed restless bandit problem. My focus will be on an instance of the bandit problem where the pay-off distributions are stationary $\phi$-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice since it is known to be PSPACE-hard. The objective is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. I show that under some conditions on the $\phi$-mixing coefficients, a modified version of the UCB algorithm proves effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, the $\phi$-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis. I will ensure to make the talk accessible to non-experts. |
|
Fri 4 May, '18- |
CRiSM SeminarB3.02Wenyang Zhang - (University of York)Homogeneity Pursuit in Single Index Models based Panel Data Analysis
|
|
Fri 18 May, '18- |
CRiSM SeminarB3.02Sergio Bacallado (University of Cambridge)Three stories on clinical trial design The design of randomised clinical trials is one of the most classical applications of modern Statistics. The first part of this talk has to do with adaptive trial designs, which aim to minimise the harm to study participants by biasing randomisation toward arms that are performing well, or by closing experimental arms when there is early evidence of futility. We first propose a class of Bayesian uncertainty-directed trial designs, which aim to maximise information gain at the trial's conclusion, and we show in applications to various types of trial that it has superior operating characteristics when compared to simpler adaptive policies. In a second section, I will discuss the use of reinforcement learning algorithms to approximate Bayes-optimal policies given a prior for the treatment effects and a utility function combining outcomes for participants and the uncertainty of treatment effects. The last part of the talk will consider the possibility of sharing preliminary data from trials with patients and physicians who are making enrollment decisions. This practice may be in line with a trend toward patient-centred clinical research, but it presents many challenges and potential pitfalls. Through a simulation study, modelled on the landscape of Glioblastoma trials in the last 15 years, we explore how such 'permeable' designs could affect operating characteristics and the statistical validity of trial conclusions. Joint work with Lorenzo Trippa, Steffen Ventz, and Brian Alexander
|
|
Fri 18 May, '18- |
CRiSM SeminarA1.01Caitlin Buck (University of Sheffield) A dilemma in Bayesian chronology construction Chronology construction was one of the first applications used to show case the value of MCMC methods for Bayesian inference (Naylor and Smith, 1988; Buck et al, 1992). As a result, Bayesian chronology construction is now ubiquitous in archaeology and is becoming increasingly popular in palaeoenvironmental research. Currently available software requires users to construct the statistical models and input prior knowledge by hand, requiring considerable expertise and patience. As a result, the published chronologies for most sites are based on a single model which is assumed to be correct. Recent research has, however, led to a proposal to automate production of Bayesian chronological models from field records. The approach uses directed acyclic graphs (DAGs) to represent the site stratigraphy and, from these, construct priors for the Bayesian hierarchical models (Dye and Buck, 2015). The related software is in the developmental stage but, before it can be released, we need to decide what advice to offer users about working with the large number of potential models that the new software will construct. In this seminar I will outline how and why Bayesian methods are so widely used in chronology construction, show case the new DAG-based approach, explain the nature of the dilemma we face and hope to start a discussion about potential practical solutions. C.E. Buck, C.D. Litton, & A.F.M. Smith (1992) Calibration of radiocarbon results pertaining to related archaeological events, Journal of Archaeological Science, Vol. 19, Iss. 5, pp 497-512. T. S. Dye & C.E. Buck (2015) Archaeological sequence diagrams and Bayesian chronological models, Journal of Archaeological Science, Vol. 63, pp 84-93. J. C. Naylor & A. F. M. Smith (1988) An Archaeological Inference Problem, Journal of the American Statistical Association, Vol. 83, Iss. 403, pp 588-595.
|
|
Fri 1 Jun, '18- |
CRiSM SeminarB3.02Victor Panaretos (EPFL)
What is the dimension of a stochastic process?
How can we determine whether a mean-square continuous stochastic process is, in fact, finite-dimensional, and if so, what its actual dimension is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional data analysis, which require low-dimensional representations whether by functional PCA or other methods. The difficulty is that the determination is to be made on the basis of iid replications of the process observed discretely and with measurement error contamination. This adds a ridge to the empirical covariance, obfuscating the underlying dimension. We build a matrix-completion-inspired test procedure that circumvents this issue by measuring the best possible least square fit of the empirical covariance's off-diagonal elements, optimised over covariances of given finite rank. For a fixed grid of sufficient size, we determine the statistic's asymptotic null distribution as the number of replications grows. We then use it to construct a bootstrap implementation of a stepwise testing procedure controlling the family-wise error rate corresponding to the collection of hypothesis formalising the question at hand. The procedure involves no tuning parameters or pre-smoothing, is indifferent to the homoskedasticity or lack of it in the measurement errors, and does not assume a low-noise regime. Based on joint work with Anirvan Chakraborty (EPFL).
|
|
Fri 15 Jun, '18- |
CRiSM SeminarB3.022-3pm B3.02, June 15, 2018 - Sarah Heaps (Newcastle University)Identifying the effect of public holidays on daily demand for gas Gas distribution networks need to ensure the supply and demand for gas are balanced at all times. In practice, this is supported by a number of forecasting exercises which, if performed accurately, can substantially lower operational costs, for example through more informed preparation for severe winters. Amongst domestic and commercial customers, the demand for gas is strongly related to the weather and patterns of life and work. In regard to the latter, public holidays have a pronounced effect, which often extends into neighbouring days. In the literature, the days over which this protracted effect is felt are typically pre-specified as fixed windows around each public holiday. This approach fails to allow for any uncertainty surrounding the existence, duration and location of the protracted holiday effects. We introduce a novel model for daily gas demand which does not fix the days on which the proximity effect is felt. Our approach is based on a four-state, non-homogeneous hidden Markov model with cyclic dynamics. In this model the classification of days as public holidays is observed, but the assignment of days as “pre-holiday”, “post-holiday” or “normal” is unknown. Explanatory variables recording the number of days to the preceding and succeeding public holidays guide the evolution of the hidden states and allow smooth transitions between normal and holiday periods. To allow for temporal autocorrelation, we model the logarithm of gas demand at multiple locations, conditional on the states, using a first-order vector autoregression (VAR(1)). We take a Bayesian approach to inference and consider briefly the problem of specifying a prior distribution for the autoregressive coefficient matrix of a VAR(1) process which is constrained to lie in the stationary region. We summarise the results of an application to data from Northern Gas Networks (NGN), the regional network serving the North of England, a preliminary version of which is already being used by NGN in its annual medium-term forecasting exercise. -- |
|
Thu 25 Oct, '18- |
CRiSM SeminarA1.01Speaker: Professor Martyn Plummer, Department of Statistics, Warwick University Abstract: We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other competing sub-models. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz’s Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist large-sample behavior of their marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems. |
|
Thu 8 Nov, '18- |
CRiSM SeminarA1.01Dr. Martin Tegner, University of Oxford A probabilistic The local volatility model is a celebrated model widely used for pricing and hedging financial derivatives. While the model’s main appeal is its capability of reproducing any given surface of observed option prices—it provides a perfect fit—the essential component of the model is a latent function which can only be unambiguously determined in the limit of infinite data. To (re)construct this function, numerous calibration methods have been suggested involving steps of interpolation and extrapolation, most often of parametric form and with point-estimates as result. We seek to look at the calibration problem in a probabilistic framework with a fully nonparametric approach based on Gaussian process priors. This immediately gives a way of encoding prior believes about the local volatility function and a hypothesis model which is highly flexible whilst being prone to overfitting. Besides providing a method for calibrating a (range of) point-estimate(s), we seek to draw posterior inference on the distribution over local volatility. This to better understand the uncertainty attached with the calibration in particular, and with the model in general. Further, we seek to understand dynamical properties of local volatility by augmenting the hypothesis space with a time dimension. Ideally, this gives us means of inferring predictive distributions not only locally, but also for entire surfaces forward in time.
-------------------------- |
|
Tue 20 Nov, '18- |
CRiSM SeminarA1.01 Dr. Kayvan Sadeghi, University College London Probabilistic Independence, Graphs, and Random Networks |
|
Thu 6 Dec, '18- |
CRiSM SeminarA1.01 Dr. Carlo Albert, EAWAG, Switzerland Bayesian Inference for Stochastic Differential Equation Models through Hamiltonian Scale Separation Bayesian parameter inference is a fundamental problem in model-based data science. Given observed data, which is believed to be a realization of some parameterized model, the aim is to find a distribution of likely parameter values that are able to explain the observed data. This so-called posterior distribution expresses the probability of a given parameter to be the "true" one, and can be used for making probabilistic predictions. For truly stochastic models this posterior distribution is typically extremely expensive to evaluate. We propose a novel approach for generating posterior parameter distributions, for stochastic differential equation models calibrated to measured time-series. The algorithm is inspired by re-interpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, whose dynamics is confined by both the model and the measurements. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for 1D problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable. |
|
Thu 17 Jan, '19- |
CRiSM SeminarMSB2.23Prof. Galin Jones, School of Statistics, University of Minnesota (14:00-15:00) Bayesian Spatiotemporal Modeling Using Hierarchical Spatial Priors, with Applications to Functional Magnetic Resonance Imaging We propose a spatiotemporal Bayesian variable selection model for detecting activation in functional magnetic resonance imaging (fMRI) settings. Following recent research in this area, we use binary indicator variables for classifying active voxels. We assume that the spatial dependence in the images can be accommodated by applying an areal model to parcels of voxels. The use of parcellation and a spatial hierarchical prior (instead of the popular Ising prior) results in a posterior distribution amenable to exploration with an efficient Markov chain Monte Carlo (MCMC) algorithm. We study the properties of our approach by applying it to simulated data and an fMRI data set. Dr. Flavio Goncalves, Universidade Federal de Minas Gerais, Brazil (15:00-16:00). Exact Bayesian inference in spatiotemporal Cox processes driven by multivariate Gaussian processes In this talk we present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to allow for evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretisation error is involved despite the non-tractability of the likelihood function and infinite dimensionality of the problem. The method is based on a Markov chain Monte Carlo algorithm that samples from the joint posterior distribution of the parameters and latent variables of the model. The models are defined in a general and flexible way but they are amenable to direct sampling from the relevant distributions, due to careful characterisation of its components. The models also allow for the inclusion of regression covariates and/or temporal components to explain the variability of the intensity function. These components may be subject to relevant interaction with space and/or time. Real and simulated examples illustrate the methodology, followed by concluding remarks. |
|
Thu 31 Jan, '19- |
CRiSM SeminarMSB2.23Professor Paul Fearnhead, Lancaster University - 14:00-1500 Efficient Approaches to Changepoint Problems with Dependence Across Segments Changepoint detection is an increasingly important problem across a range of applications. It is most commonly encountered when analysing time-series data, where changepoints correspond to points in time where some feature of the data, for example its mean, changes abruptly. Often there are important computational constraints when analysing such data, with the number of data sequences and their lengths meaning that only very efficient methods for detecting changepoints are practically feasible. A natural way of estimating the number and location of changepoints is to minimise a cost that trades-off a measure of fit to the data with the number of changepoints fitted. There are now some efficient algorithms that can exactly solve the resulting optimisation problem, but they are only applicable in situations where there is no dependence of the mean of the data across segments. Using such methods can lead to a loss of statistical efficiency in situations where e.g. it is known that the change in mean must be positive. This talk will present a new class of efficient algorithms that can exactly minimise our cost whilst imposing certain constraints on the relationship of the mean before and after a change. These algorithms have links to recursions that are seen for discrete-state hidden Markov Models, and within sequential Monte Carlo. We demonstrate the usefulness of these algorithms on problems such as detecting spikes in calcium imaging data. Our algorithm can analyse data of length 100,000 in less than a second, and has been used by the Allen Brain Institute to analyse the spike patterns of over 60,000 neurons. (This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and Daniela Witten.) Dr. Sandipan Roy, Department of Mathematical Science, University of Bath (15:00-16:00) Network Heterogeneity and Strength of Connections Abstract: Detecting strength of connection in a network is a fundamental problem in understanding the relationship among individuals. Often it is more important to understand how strongly the two individuals are connected rather than the mere presence/absence of the edge. This paper introduces a new concept of strength of connection in a network through a nonparameteric object called “Grafield”. “Grafield” is a piece-wise constant bi-variate kernel function that compactly represents the affinity or strength of ties (or interactions) between every pair of vertices in the graph. We estimate the “Grafield” function through a spectral analysis of the Laplacian matrix followed by a hard thresholding (Gavish & Donoho, 2014) of the singular values. Our estimation methodology is valid for asymmetric directed network also. As a by product we get an efficient procedure for edge probability matrix estimation as well. We validate our proposed approach with several synthetic experiments and compare with existing algorithms for edge probability matrix estimation. We also apply our proposed approach to three real datasets- understanding the strength of connection in (a) a social messaging network, (b) a network of political parties in US senate and (c) a neural network of neurons and synapses in C. elegans, a type of worm. |
|
Thu 14 Feb, '19- |
CRiSM SeminarMSB2.23Philipp Hermann, Institute of Applied Statistics, Johannes Kepler University Linz, Austria Time: 14:00-15:00 LDJump: Estimating Variable Recombination Rates from Population Genetic Data Recombination is a process during meiosis which starts with the formation of DNA double-strand breaks and results in an exchange of genetic material between homologous chromosomes. In many species, recombination is concentrated in narrow regions known as hotspots, flanked by large zones with low recombination. As recombination plays an important role in evolution, its estimation and the identification of hotspot positions is of considerable interest. In this talk we introduce LDJump, our method to estimate local population recombination rates with relevant summary statistics as explanatory variables in a regression model. More precisely, we divide the DNA sequence into small segments and estimate the recombination rate per segment via the regression model. In order to obtain change-points in recombination we apply a frequentist segmentation method. This approach controls a type I error and provides confidence bands for the estimator. Overall LDJump identifies hotspots at high accuracy under different levels of genetic diversity as well as demography and is computationally fast even for genomic regions spanning many megabases. We will present a practical application of LDJump on a region of the human chromosome 21 and compare our estimated population recombination rates with experimentally measured recombination events. (joint work with Andreas Futschik, Irene Tiemann-Boege, and Angelika Heissl) Professor Dr. Ingo Scholtes, Data Analytics Group, University of Zürich Time: 15:00-16:00 Optimal Higher-Order Network Analytics for Time Series Data Network-based data analysis techniques such as graph mining, social network analysis, link prediction and clustering are an important foundation for data science applications in computer science, computational social science, economics and bioinformatics. They help us to detect patterns in large corpora of data that capture relations between genes, brain regions, species, humans, documents, or financial institutions. While this potential of the network perspective is undisputed, advances in data sensing and collection increasingly provide us with high-dimensional, temporal, and noisy data on real systems. The complex characteristics of such data sources pose fundamental challenges for network analytics. They question the validity of network abstractions of complex systems and pose a threat for interdisciplinary applications of data analytics and machine learning. To address these challenges, I introduce a graphical modelling framework that accounts for the complex characteristics of real-world data on complex systems. I demonstrate this approach in time series data on technical, biological, and social systems. Current methods to analyze the topology of such systems discard information on the timing and ordering of interactions, which however determines which elements of a system can influence each other via paths. To solve this issue, I introduce a modelling framework that (i) generalises standard network representations towards multi-order graphical models for causal paths, and (ii) uses statistical learning to achieve an optimal balance between explanatory power and model complexity. The framework advances the theoretical foundation of data science and sheds light on the important question when network representations of time series data are justified. It is the basis for a new generation of data analytics and machine learning techniques that account both for temporal and topological characteristics in real-world data. |
|
Thu 28 Feb, '19- |
CRiSM SeminarMSB2.23Prof. Isham Valerie, Statistical Science, University College London, UK (15:00-16:00) Stochastic Epidemic Models: Approximations, structured populations and networks |
|
Thu 14 Mar, '19- |
CRiSM SeminarA1.01Speaker: Spencer Wheatley, ETH Zurich, Switzerland Title: The "endo-exo" problem in financial market price fluctuations, & the ARMA point process The "endo-exo" problem -- i.e., decomposing system activity into exogenous and endogenous parts -- lies at the heart of statistical identification in many fields of science. E.g., consider the problem of determining if an earthquake is a mainshock or aftershock, or if a surge in the popularity of a youtube video is because it is "going viral", or simply due to high activity across the platform. Solution of this problem is often plagued by spurious inference (namely false strong interaction) due to neglect of trends, shocks and shifts in the data. The predominant point process model for endo-exo analysis in the field of quantitative finance is the Hawkes process. A comparison of this field with the relatively mature fields of econometrics and time series identifies the need to more rigorously control for trends and shocks. Doing so allows us to test the hypothesis that the market is "critical" -- analogous to a unit root test commonly done in economic time series -- and challenge earlier results. Continuing "lessons learned" from the time series field, it is argued that the Hawkes point process is analogous to integer valued AR time series. Following this analogy, we introduce the ARMA point process, which flexibly combines exo background activity (Poisson), shot-noise bursty dynamics, and self-exciting (Hawkes) endogenous activity. We illustrate a connection to ARMA time series models, as well as derive an MCEM (Monte Carlo Expectation Maximization) algorithm to enable MLE of this process, and assess consistency by simulation study. Remaining challenges in estimation and model selection as well as possible solutions are discussed.
[1] Wheatley, S., Wehrli, A., and Sornette, D. "The endo-exo problem in high frequency financial price fluctuations and rejecting criticality". To appear in Quantitative Finance (2018). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3239443 [2] Wheatley, S., Schatz, M., and Sornette, D. "The ARMA Point Process and its Estimation." arXiv preprint arXiv:1806.09948 (2018). |