Events
Fri 22 Jan, '16- |
CRiSM SeminarB1.01Li Su with Michael J. Daniels (MRC Biostatistics Unit) |
|
Fri 5 Feb, '16- |
CRiSM SeminarB1.01Ewan Cameron (Oxford, Dept of Zoology) Progress and (Statistical) Challenges in Malariology Abstract: In this talk I will describe some key statistical challenges faced by researchers aiming to quantify the burden of disease arising from Plasmodium falciparum malaria at the population level. These include covariate selection in the 'big data' setting, handling spatially-correlated residuals at scale, calibration of individual simulation models of disease transmission, and the embedding of continuous-time, discrete-state Markov Chain solutions within hierarchical Bayesian models. In each case I will describe the pragmatic solutions we've implemented to-date within the Malaria Atlas Project, and highlight more sophisticated solutions we'd like to have in the near-future if the right statistical methodology and computational tools can be identified and/or developed to this end. References: http://www.nature.com/nature/journal/v526/n7572/abs/nature15535.html http://www.nature.com/ncomms/2015/150907/ncomms9170/full/ncomms9170.html http://www.ncbi.nlm.nih.gov/pubmed/25890035 http://link.springer.com/article/10.1186/s12936-015-0984-9
|
|
Fri 19 Feb, '16- |
CRiSM SeminarB1.01Theresa Smith (CHICAS, Lancaster Medical School) Modelling geo-located health data using spatio-temporal log-Gaussian Cox processes Abstract: Health data with high spatial and temporal resolution are becoming more common, but there are several practical and computational challenges to using such data to study the relationships between disease risk and possible predictors. These difficulties include lack of measurements on individual-level covariates/exposures, integrating data measured on difference spatial and temporal units, and computational complexity. In this talk, I outline strategies for jointly estimating systematic (i.e., parametric) trends in disease risk and assessing residual risk with spatio-temporal log-Gaussian Cox processes (LGCPs). In particular, I will present a Bayesian methods and MCMC tools for using spatio-temporal LGCPs to investigate the roles of environmental and socio-economic risk-factors in the incidence of Campylobacter in England.
|
|
Fri 4 Mar, '16- |
CRiSM SeminarB1.01Alan Gelfand (Duke, Dept of Statistical Science) Title: Space and circular time log Gaussian Cox processes with application to crime event data Abstract: We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a random intensity which we model as a realization of a spatio-temporal log Gaussian process. In fact, we view time as circular, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type. Furthermore, each crime event is marked by day of the year which we convert to day of the week. We present models to accommodate such data. Then, we extend the modeling to include the marks. Our specifications naturally take the form of hierarchical models which we t within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. Again, we have location, hour, day of the year, and crime type for each event. We investigate a rich range of models to enhance our understanding of the set of incidences. |
|
Fri 18 Mar, '16- |
CRiSM SeminarB1.01Petros Dellaporta (UCL) Scalable inference for a full multivariate stochastic volatility model Abstract: We introduce a multivariate stochastic volatility model for asset returns that imposes no restrictions to the structure of the volatility matrix and treats all its elements as functions of latent stochastic processes. When the number of assets is prohibitively large, we propose a factor multivariate stochastic volatility model in which the variances and correlations of the factors evolve stochastically over time. Inference is achieved via a carefully designed feasible andscalable Markov chain Monte Carlo algorithm that combines two computationally important ingredients: it utilizes invariant to the prior Metropolis proposal densities for simultaneously updating all latent paths and has quadratic, rather than cubic, computational complexity when evaluating the multivariate normal densities required. We apply our modelling and computational methodology to 571 stock daily returns of Euro STOXX index for data over a period of 10 years. |
|
Mon 4 Apr, '16 - Fri 8 Apr, '16All-day |
CRiSM Master Class: Non-Parametric BayesMS.01Runs from Monday, April 04 to Friday, April 08. |
|
Fri 6 May, '16- |
CRiSM SeminarMS.03Mikhail Malyutov (Northeastern University) Context-free and Grammer-free Statistical Testing Identity of Styles Our theory justifies our thorough statistical modification CCC of D.Khmelev's conditional compression based classification idea of 2001 and the 7 years of intensive applied statistical implementation of CCC for authorship attribution of literary works. Homogeneity testing based on SCOT training with applications to the financial modeling and Statistical Quality Control are also in progress. Both approaches are desrcibed in a Springer monograph which appears shortly. Stochastic Context Tree (abbreviated as SCOT) is m-Markov Chain with every state of a spring independent of the symbols in its more remote past than the context of length determined by the preceding symbols of this state. In all of our applications we uncover a complex sparse structure of memory in SCOT models that allows excellent discrimination power. In additiion, a straightforward estimation of the stationary distributio of SCOT gives insight into contexts crucial for discrimination between, say, different regimes of financial data or between styles of different authors of literary tests. |
|
Fri 13 May, '16- |
CRiSM SeminarB1.01Michael Newton (University of Wisconsin-Madison) Ranking and selection revisited In large-scale inference the precision with which individual parameters are estimated may vary greatly among parameters, thus complicating the task to rank order parameters. I present a framework for evaluating different ranking/selection schemes as well as an empirical Bayesian methodology showing theoretical and empirical advantages over available approaches. Examples from genomics and sports will help to illustrate the issues. |
|
Fri 20 May, '16- |
CRiSM SeminarD1.07Jon Forster (Southampton) Model integration for mortality estimation and forecasting The decennial English Life Tables have been produced after every UK decennial census since 1841. They are based on graduated (smoothed) estimates of central mortality rates, or related functions. For UK mortality, over the majority of the age range, a GAM can provide a smooth function which adheres acceptably well to the crude mortality rates. At the very highest ages, the sparsity of the data mean that the uncertainty about mortality rates is much greater. A further issue is that life table estimation requires us to extrapolate the estimate of the mortality rate function to ages beyond the extremes of the observed data. Our approach integrates a GAM at lower ages with a low-dimensional parametric model at higher ages. Uncertainty about the threshold age ,at which the transition to the simpler model occurs, is integrated into the analysis. This base structure can then be extended into a model for the evolution of mortality rates over time, allowing the forecasting of mortality rates, a key input into demographic projections necessary for planning. |
|
Fri 3 Jun, '16- |
CRiSM SeminarD1.07Degui Li (University of York) Panel Data Models with Interactive Fixed Effects and Multiple Structural Breaks In this paper we consider estimation of common structural breaks in panel data models with interactive fixed effects which are unobservable. We introduce a penalized principal component (PPC) estimation procedure with an adaptive group fused LASSO to detect the multiple structural breaks in the models. Under some mild conditions, we show that with probability approaching one the proposed method can correctly determine the unknown number of breaks and consistently estimate the common break dates. furthermore, we estimate the regression coefficients through the post-LASSO method and establish the asymptotic distrbution theory for the resulting estimators. The developed methodology and theory are applicable to the case of dynamic panel data models. The Monte Carlo simulation results demonstrate that the proposed method works well in finite samples with low false detection probability when there is no structural break and high probability of correctly estimating the break numbers when the structural breaks exist. We finally apply our method to study the environmental Kuznets curve for 74 countries over 40 years and detect two breaks in the data. |
|
Fri 10 Jun, '16- |
CRiSM SeminarClaire Gormley (University College Dublin) Clustering High Dimensional Mixed Data: Joint Analysis of Phenotypic and Genotypic Data The LIPGENE-SU.VI.MAX study, like many others, recorded high dimensional continuous phenotypic data and categorical genotypic data. Interest lies in clustering the study participant into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model which elegantly accommodates high dimensional, mixed data is developed to cluster participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes (‘healthy’ and ‘at risk’) are uncovered. A small subset of variables is deemed discriminatory which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, seven years after the data were collected, participants underwent further analysis to diagnose presence or absence of the metabolic syndrome (MetS). The two uncovered sub-phenotypes strongly correspond to the seven year follow up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS, and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. |
|
Fri 17 Jun, '16- |
CRiSM SeminarD1.07
|
|
Fri 1 Jul, '16- |
CRiSM SeminarD1.07Gonzalo Garcia Donato (Universidad Castilla La Mancha) Criteria for Bayesian model choice In model choice (or model selection) several statistical models are postulated as legitimate explanations for a response variable and this uncertainty is to be propagated in the inferential process. The type of questions one is aimed to answer is assorted ranging from e.g. identifying the `true’ model to produce more reliable estimates that takes into account this extra source of variability. Particular important problems of model choice are hypothesis testing, model averaging and variable selection. The Bayesian paradigm provides a conceptually simple and unified solution to the model selection problem: the posterior probabilities of the competing models. This is also named the posterior distribution over the model space and is a simple function of Bayes factors. Answering any question of interest just reduces to summarizing properly this posterior distribution. Unfortunately, the posterior distribution may depend dramatically on the prior inputs and unlike estimation problems (where model is fixed) such sensitivity does not vanish with large sample sizes. Additionally, it is well known that standard solutions like improper or vague priors cannot be used in general as they result in arbitrary Bayes factors. Bayarri et al (2012) propose tackling these difficulties basing the assignment of prior distributions in objective contexts on a number of sensible statistical. This approach takes a step beyond a way of analyzing the problem that Jeffreys inaugurated fifty years ago. In this talk the criteria will be presented with emphasis on those aspects who serve to characterize features of the priors that, until today, have been popularly used without a clear justification. Originally the criteria were accompanied with an application to variable selection in regression models and here we will see how they can be useful to tackle other important scenarios like high dimensional settings or survival problems. |
|
Tue 30 Aug, '16 - Thu 1 Sep, '16All-day |
CRiSM Master Class on Sparse RegressionMS.01Runs from Tuesday, August 30 to Thursday, September 01. |
|
Fri 14 Oct, '16- |
CRiSM SeminarA1.01Daniel Rudolf - Perturbation theory for Markov chains Perturbation theory for Markov chains addresses the question of how small differences in the transition probabilities of Markov chains are reflected in differences between their distributions. Under a convergence condition we present an estimate of the Wasserstein distance of the nth step distributions between an ideal, unperturbed and an approximating, perturbed Markov chain. We illustrate the result with an example of an autoregressive process. |
|
Fri 28 Oct, '16- |
CRiSM SeminarA1.01Peter Orbanz |
|
Fri 11 Nov, '16- |
CRiSM SeminarA1.01Mingli Chen |
|
Fri 25 Nov, '16- |
CRiSM SeminarA1.01 |
|
Fri 9 Dec, '16- |
CRiSM SeminarA1.01Satish Iyengar - Big Data Challenges in Psychiatry Current psychiatric diagnoses are based primarily on self-reported experiences. Unfortu- nately, treatments for the diagnoses are not effective for all patients. One hypothesized reason is that “artificial grouping of heterogeneous syndromes with different pathophysio- logical mechanisms into one disorder.” To address this problem, the US National Institute of Mental Health instituted the Research Domain Criteria framework in 2009. This re- search framework calls for integrating data from many levels of information: genes, cells, molecules, circuits, physiology, behavior, and self-report. Clustering comes to the forefront as a key tool in this big-data effort. In this talk, we present a case study of the use of mix- ture models to cluster older adults based on measures of sleep from three domains: diary, actigraphy, and polysomnography. Challenges in this effort include the use of mixtures of asymmetric (skewed) distributions, a large number of potential clustering variables, and seeking clinically meaningful solutions. We present novel variable selection algorithms, study them by simulation, and demonstrate our methods on the sleep data. This work is joint with Dr. Meredith Wallace.
|
|
Fri 20 Jan, '17- |
CRiSM SeminarMA_B1.01Yi Yu (University of Bristol) Title: Estimating whole brain dynamics using spectral clustering Abstract: The estimation of time-varying networks for functional Magnetic Resonance Imaging (fMRI) data sets is of increasing importance and interest. In this work, we formulate the problem in a high-dimensional time series framework and introduce a data-driven method, namely Network Change Points Detection (NCPD), which detects change points in the network structure of a multivariate time series, with each component of the time series represented by a node in the network. NCPD is applied to various simulated data and a resting-state fMRI data set. This new methodology also allows us to identify common functional states within and across subjects. Finally, NCPD promises to offer a deep insight into the large-scale characterisations and dynamics of the brain. This is joint work with Ivor Cribben (Alberta School of Business). |
|
Fri 3 Feb, '17- |
CRiSM SeminarMA_B1.01Liz Ryan (KCL) Title: Simulation-based Fully Bayesian Experimental Design Abstract: Bayesian experimental design is a fast growing area of research with many real-world applications. As computational power has increased over the years, so has the development of simulation-based design methods, which involve a number of Bayesian algorithms, such as Markov chain Monte Carlo (MCMC) algorithms. However, many of the proposed algorithms have been found to be computationally intensive for complex or nonstandard design problems, such as those which require a large number of design points to be found and/or those for which the observed data likelihood has no analytic expression. In this work, we develop novel extensions of existing algorithms which have been used for Bayesian experimental design, and also incorporate methodologies which have been used for Bayesian inference into the design framework, so that solutions to more complex design problems can be found. |
|
Fri 17 Feb, '17- |
CRiSM SeminarMA_B1.01Ioannis Kosmidis Title: Reduced-bias inference for regression models with tractable and intractable likelihoods Abstract: This talk focuses on a unified theoretical and algorithmic framework for reducing bias in the estimation of statistical models from a practitioners point of view. We will briefly discuss how shortcomings of classical estimators and of inferential procedures depending on those can be overcome via reduction of bias, and provide a few demonstrations stemming from current and past research on well-used statistical models with tractable likelihoods, including beta regression for bounded-domain responses, and the typically small-sample setting of meta-analysis and meta-regression in the presence of heterogeneity. The large impact that bias in the estimation of the variance components can have on inference motivates delivering higher-order corrective methods for generalised linear mixed models. The challenges in doing that will be presented along with resolutions stemming from current research. |
|
Fri 3 Mar, '17- |
CRiSM SeminarMA_B1.01Marcelo Pereyra Bayesian inference by convex optimisation: theory, methods, and algorithms. Abstract: Convex optimisation has become the main Bayesian computation methodology in many areas of data science such as mathematical imaging and machine learning, where high dimensionality is often addressed by using models that are log-concave and where maximum-a-posteriori (MAP) estimation can be performed efficiently by optimisation. The first part of this talk presents a new decision-theoretic derivation of MAP estimation and shows that, contrary to common belief, under log-concavity MAP estimators are proper Bayesian estimators. A main novelty is that the derivation is based on differential geometry. Following on from this, we establish universal theoretical guarantees for the estimation error involved and show estimation stability in high dimensions. Moreover, the second part of the talk describes a new general methodology for approximating Bayesian high-posterior-density regions in log-concave models. The approximations are derived by using recent concentration of measure results related to information theory, and can be computed very efficiently, even in large-scale problems, by using convex optimisation techniques. The approximations also have favourable theoretical properties, namely they outer-bound the true high-posterior-density credibility regions, and they are stable with respect to model dimension. The proposed methodology is finally illustrated on two high-dimensional imaging inverse problems related to tomographic reconstruction and sparse deconvolution, where they are used to explore the uncertainty about the solutions, and where convex-optimisation-empowered proximal Markov chain Monte Carlo algorithms are used as benchmark to compute exact credible regions and measure the approximation error. |
|
Fri 17 Mar, '17- |
CRiSM SeminarMA_B1.01Paul Birrell (MRC Biostatistics Unit, Cambridge) Towards Computationally Efficient Epidemic Inference
|
|
Fri 5 May, '17- |
CRiSM SeminarD1.07"Adaptive MCMC For Everyone" |
|
Fri 19 May, '17- |
CRiSM SeminarD1.07Korbinian Strimmer (Imperial) An entropy approach for integrative genomics and network modeling Multivariate regression approaches such as Seemingly Unrelated Regression (SUR) or Partial Least Squares (PLS) are commonly used in vertical data integration to jointly analyse different types of omics data measured on the same samples, such as SNP and gene expression data (eQTL) or proteomic and transcriptomic data. However, these approaches may be difficult to apply and to interpret for computational and conceptual reasons. Here we present a simple alternative approach to integrative genomics based on using relative entropy to characterise the overall association between two (or more) sets of omic data, and to infer the underlying corresponding association network among the individual covariates. This approach is computationally inexpensive and can be applied to large-dimensional data sets. A key and novel feature of our method is decomposition of the total strength between two or more groups of variables based on optimal whitening of the individual data sets. Correspondingly, it may also be viewed as a special form of a latent-variable multivariate regression model. We illustrate this approach by analysing metabolomic and transcriptomic data from the DILGOM study. References: A. Kessy, A. Lewin, and K. Strimmer. 2017. Optimal whitening and decorrelation. The American Statistician, to appear. http://dx.doi.org/10.1080/00031305.2016.1277159 T. Jendoubi and K. Strimmer. 2017. Data integration and network modeling: an entropy approach. In prep. |
|
Fri 30 Jun, '17- |
CRiSM Seminar - Paul Kirk (BSU, Cambridge) (C1.06)C1.06, Zeeman BuildingTitle: Semi-supervised multiview clustering for high-dimensional data |
|
Fri 27 Oct, '17- |
CRiSM SeminarA1.01Speaker: Davide Pigoli (King's College London) Title: Functional data analysis of biological growth processes Abstract: Functional data are examples of high-dimensional data when the observed variables have a natural ordering and are generated by an underlying smooth process. These additional properties allow us to develop methods that go beyond what would be possible with classical multivariate techniques. In this talk, I will demonstrate the potential of functional data analysis for biological growth processes in two different applications. The first one is in forensic entomology, where there is the need of estimating time-dependent growth curves from experiments where larvae have been exposed to a relatively small number of constant temperature profiles. The second one is in quantitative genetics, where the growth curve is a function-valued phenotypic trait from which the continuous genetic variation needs to be estimated. |
|
Thu 9 Nov, '17- |
CRiSM SeminarC0.08Speaker: Jonathan Keith (Monash University) Title: Markov chain Monte Carlo in discrete spaces, with applications in bioinformatics and ecology Abstract: Efficient sampling of probability distributions over large discrete spaces is a challenging problem that arises in many contexts in bioinformatics and ecology. For example, segmentation of genomes to identify putative functional elements can be cast as a multiple change-point problem involving thousands or even millions of change-points. Another example involves reconstructing the invasion history of an introduced species by embedding a phylogenetic tree in a landscape. A third example involves inferring networks of molecular interactions in cellular systems. In this talk I describe a generalisation of the Gibbs sampler that allows this well known strategy for sampling probability distributions in R^n to be adapted for sampling discrete spaces. The technique has been successfully applied to each of the problems mentioned above. However, these problems remain highly computationally intensive. I will discuss a number of alternatives for efficient sampling of such spaces, and will be seeking collaborations to develop these and other new approaches. |
|
Fri 24 Nov, '17- |
CRiSM SeminarA1.013-4pm A1.01, Nov 24, 2017 - Song LiuTitle: Trimmed Density Ratio Estimation Abstract: Density ratio estimation has become a versatile tool in machine learning community recently. However, due to its unbounded nature, density ratio estimation is vulnerable to corrupted data points, which often pushes the estimated ratio toward infinity. In this paper, we present a robust estimator which automatically identifies and trims outliers. The proposed estimator has a convex formulation, and the global optimum can be obtained via subgradient descent. We analyze the parameter estimation error of this estimator under high-dimensional settings. Experiments are conducted to verify the effectiveness of the estimator. |