Skip to main content Skip to navigation

Events

Select tags to filter on
  More events Jump to any date

Search calendar

Enter a search term into the box below to search for all events matching those terms.

Start typing a search term to generate results.

How do I use this calendar?

You can click on an event to display further information about it.

The toolbar above the calendar has buttons to view different events. Use the left and right arrow icons to view events in the past and future. The button inbetween returns you to today's view. The button to the right of this shows a mini-calendar to let you quickly jump to any date.

The dropdown box on the right allows you to see a different view of the calendar, such as an agenda or a termly view.

If this calendar has tags, you can use the labelled checkboxes at the top of the page to select just the tags you wish to view, and then click "Show selected". The calendar will be redisplayed with just the events related to these tags, making it easier to find what you're looking for.

 
Fri 23 Jan, '15
-
CRiSM Seminar - Rebecca Killick (Lancaster), Peter Green (Bristol)
B1.01 (Maths)

Rebecca Killick (Lancaster)
Forecasting locally stationary time series
Within many fields forecasting is an important statistical tool. Traditional statistical techniques often assume stationarity of the past in order to produce accurate forecasts. For data arising from the energy sector and others, this stationarity assumption is often violated but forecasts still need to be produced. This talk will highlight the potential issues when moving from forecasting stationary to nonstationary data and propose a new estimator, the local partial autocorrelation function, which will aid us in forecasting locally stationary data. We introduce the lpacf alongside associated theory and examples demonstrating its use as a modelling tool. Following this we illustrate the new estimator embedded within a forecasting method and show improved forecasting performance using this new technique.

Peter Green (Bristol)
Inference on decomposable graphs: priors and sampling
The structure in a multivariate distribution is largely captured by the conditional independence relationships that hold among the variables, often represented graphically, and inferring these from data is an important step in understanding a complex stochastic system. We would like to make simultaneous inference about the conditional independence graph and parameters of the model; this is known as joint structural and quantitative learning in the machine learning literature. The Bayesian paradigm allows a principled approach to this simultaneous inference task. There are tremendous computational and interpretational advantages in assuming the conditional independence graph is decomposable, and not too many disadvantages. I will present a new structural Markov property for decomposable graphs, show its consequences for prior modelling, and discuss a new MCMC algorithm for sampling graphs that enables Bayesian structural and quantitative learning on a much bigger scale than previously possible. This is joint work with Alun Thomas (Utah).

Fri 6 Feb, '15
-
CRiSM Seminar - Gareth Peters (UCL), Leonhard Held (University of Zurich)
B1.01 (Maths)

Gareth Peters (UCL)
Sequential Monte Carlo Samplers for capital allocation under copula-dependent risk models
In this talk we assume a multivariate risk model has been developed for a portfolio and its capital derived as a homogeneous risk measure. The Euler (or gradient) principle, then, states that the capital to be allocated to each component of the portfolio has to be calculated as an expectation conditional to a rare event, which can be challenging to evaluate in practice. We exploit the copula-dependence within the portfolio risks to design a Sequential Monte Carlo Samplers based estimate to the marginal conditional expectations involved in the problem, showing its efficiency through a series of computational examples.

Leonard Held (University of Zurich)
Adaptive prior weighting in generalized linear models
The prior distribution is a key ingredient in Bayesian inference. Prior information in generalized linear models may come from different sources and may or may not be in conflict with the observed data. Various methods have been proposed to quantify a potential prior-data conflict, such as Box's $p$-value. However, the literature is sparse on methodology what to do if the prior is not compatible with the observed data. To this end, we review and extend methods to adaptively weight the prior distribution. We relate empirical Bayes estimates of prior weight to Box's p-value and propose alternative fully Bayesian approaches. Prior weighting can be done for the joint prior distribution of the regression coefficients or - under prior independence - separately for each regression coefficient or for pre-specified blocks of regression coefficients. We outline how the proposed methodology can be implemented using integrated nested Laplace approximations (INLA) and illustrate the applicability with a logistic and a log-linear Poisson multiple regression model. This is joint work with Rafael Sauter.

Fri 20 Feb, '15
-
CRiSM Seminar - Marina Knight (York)
B1.01 (Maths)

Marina Knight (York)

Hurst exponent estimation for long-memory processes using wavelet lifting
Reliable estimation of long-range dependence parameters, such as the Hurst exponent, is a well-studied problem in the statistical literature. However, many time series observed in practice present missingness or are naturally irregularly sampled. In these settings, the current literature is sparse, with most approaches requiring heavy modifications in order to deal with the time irregularity. In this talk we present a technique for estimating the Hurst exponent of time series with long memory. The method is based on a flexible wavelet transform built by means of the lifting scheme, and is naturally suitable for series exhibiting time domain irregularity. We shall demonstrate the performance of this new method and illustrate the technique through time series applications in climatology.

Fri 1 May, '15
-
CRiSM Seminar - Marcelo Pereyra (Bristol), Magnus Rattray (Manchester)
D1.07 (Complexity)
Marcelo Pereyra (Bristol)
Proximal Markov chain Monte Carlo: stochastic simulation meets convex optimisation
Convex optimisation and stochastic simulation are two powerful computational methodologies for performing statistical inference in high-dimensional inverse problems. It is widely acknowledged that these methodologies can complement each other very well, yet they are generally studied and used separately. This talk presents a new Langevin Markov chain Monte Carlo method that uses elements of convex analysis and proximal optimisation to simulate efficiently from high-dimensional densities that are log-concave, a class of probability distributions that is widely used in modern high-dimensional statistics and data analysis. The method is based on a new first-order approximation for Langevin diffusions that uses Moreau-Yoshida approximations and proximity mappings to capture the log-concavity of the target density and construct Markov chains with favourable convergence properties. This approximation is closely related to Moreau-Yoshida regularisations for convex functions and uses proximity mappings instead of gradient mappings to approximate the continuous-time process. The proposed method complements existing Langevin algorithms in two ways. First, the method is shown to have very robust stability properties and to converge geometrically for many target densities for which other algorithms are not geometric, or only if the time step is sufficiently small. Second, the method can be applied to high-dimensional target densities that are not continuously differentiable, a class of distributions that is increasingly used in image processing and machine learning and that is beyond the scope of existing Langevin and Hamiltonian Monte Carlo algorithms. The proposed methodology is demonstrated on two challenging models related to image resolution enhancement and low-rank matrix estimation, which are not well addressed by existing MCMC methodology.


Magnus Rattray (Manchester)
Gaussian process modelling for omic time course data
We are developing methods based on Gaussian process inference for analysing data from high-throughput biological time course data. Applications range from classical statistical problems such as clustering and differential expression through to systems biology models of cellular processes such as transcription and it's regulation. Our focus is on developing tractable Bayesian methods which scale to genome-wide applications. I will describe our approach to a number of problems: (1) non-parametric clustering of replicated time course data; (2) inferring the full posterior of the perturbation time point from two-sample time course data; (3) inferring the pre-mRNA elongation rate from RNA polymerase ChIP-Seq time course data; (4) uncovering transcriptional delays by integrating pol-II and RNA time course data through a simple differential equation model.

Fri 15 May, '15
-
CRiSM Seminar - Carlos Carvalho (UT Austin), Andrea Riebler (Norwegian University of Science & Technology)
D1.07 (Complexity)

Carlos Carvalho, (The University of Texas)

Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective
Selecting a subset of variables for linear models remains an active area of research. This article reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.

Andrea Riebler, (Norwegian University of Science and Technology)
Projecting cancer incidence and mortality: Bayesian age-period-cohort models ready for routine use
Projections of age-specific cancer data are of strong interest due to demographical changes, but also advances in medical diagnosis and treatment. Although Bayesian age-period-cohort (APC) models have been shown to be beneficial compared to simpler statistical models in this context, they are not yet used in routine practice. Reasons might be two-fold. First, Bayesian APC models have been criticised for producing too wide credible intervals. Second, there might be a lack of sound and at the same time easy-to-use software. Here, we address both concerns by introducing efficient MCMC-free software and showing that probabilistic forecasts obtained by the Bayesian APC model are well calibrated. We use hitherto annual lung cancer data for females in five different countries and omit the observations from the last 10 years. Consequently, we compare the yearly predictions with the actual observed data based on the absolute error and the continuous ranked probability score. Further, we assess calibration of one-step-ahead predictive distributions.

Fri 29 May, '15
-
CRiSM Seminar - Clifford Lam (LSE), Zoltan Szabo (UCL)
D1.07 (Complexity)

Zoltán Szabó, (UCL)

Regression on Probability Measures: A Simple and Consistent Algorithm

We address the distribution regression problem: we regress from probability measures to Hilbert-space valued outputs, where only samples are available from the input distributions. Many important statistical and machine learning problems can be phrased within this framework including point estimation tasks without analytical solution, or multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees requires density estimation (which often performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the distribution regression problem: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. We prove that this scheme is consistent under mild conditions (for distributions on separable topological domains endowed with kernels), and construct explicit finite sample bounds on the excess risk as a function of the sample numbers and the problem difficulty, which hold with high probability. Specifically, we establish the consistency of set kernels in regression, which was a 15-year-old-open question, and also present new kernels on embedded distributions. The practical efficiency of the studied technique is illustrated in supervised entropy learning and aerosol prediction using multispectral satellite images. [Joint work with Bharath Sriperumbudur, Barnabas Poczos and Arthur Gretton.]

 

Clifford Lam, (LSE)

Nonparametric Eigenvalue-Regularized Precision or COvariance Matrix Estimator for Low and High Frequency Data Analysis

We introduce nonparametric regularization of the eigenvalues of a sample covariance matrix through splitting of the data (NERCOME), and prove that NERCOME enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius norm. One advantage of NERCOME is its computational speed when the dimension is not too large. We prove that NERCOME is positive definite almost surely, as long as the true covariance matrix is so, even when the dimension is larger than the sample size. With respect to the inverse Stein’s loss function, the inverse of our estimator is asymptotically the optimal precision matrix estimator. Asymptotic efficiency loss is defined through comparison with an ideal estimator, which assumed the knowledge of the true covariance matrix. We show that the asymptotic efficiency loss of NERCOME is almost surely 0 with a suitable split location of the data. We also show that all the aforementioned optimality holds for data with a factor structure. Our method avoids the need to first estimate any unknowns from a factor model, and directly gives the covariance or precision matrix estimator. Extension to estimating the integrated volatility matrix for high frequency data is presented as well. Real data analysis and simulation experiments on portfolio allocation are presented for both low and high frequency data.

Fri 12 Jun, '15
-
CRiSM Seminar - Sara van der Geer (Zurich), Daniel Simpson (Warwick)
D1.07 (Complexity)

Daniel Simpson (University of Warwick)

Penalising model component complexity: A principled practical approach to constructing priors

Setting prior distributions on model parameters is the act of characterising the nature of our uncertainty and has proven a critical issue in applied Bayesian statistics. Although the prior distribution should ideally encode the users’ uncertainty about the parameters, this level of knowledge transfer seems to be unattainable in practice and applied statisticians are forced to search for a “default” prior.

Despite the development of objective priors, which are only available explicitly for a small number of highly restricted model classes, the applied statistician has few practical guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference.

In this talk, I will introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user- defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys’ priors, are designed to support Occam’s razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions.

Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations, like random effect models, spline smoothing, disease mapping, Cox proportional hazard models with time-varying frailty, spatial Gaussian fields and multivariate probit models. Further, we show how to control the overall variance arising from many model components in hierarchical models.

This joint work with Håvard Rue, Thiago G. Martins, Andrea Riebler, Geir-Arne Fuglstad (NTNU) and Sigrunn H. Sørbye (Univ. of Tromsø).

Sara van de Geer (ETH Zurich)

Norm-regularized Empirical Risk Minimization

Sara van de Geer Abstract

Fri 26 Jun, '15
-
CRiSM Seminar - Thomas Hamelryck (University of Copenhagan), Anjali Mazumder (Warwick)
D1.07 (Complexity)

Thomas Hamelryck (Bioinformatics Center, University of Copenhagen)

Inference of protein structure and ensembles using Bayesian statistics and probability kinematics

The so-called protein folding problem is the loose designation for an amalgam of closely related, unsolved problems that include protein structure prediction, protein design and the simulation of the protein folding process. We adopt a unique Bayesian approach to modelling bio-molecular structure, based on graphical models, directional statistics and probability kinematics. Notably, we developed a generative probabilistic model of protein structure in full atomic detail. I will give an overview of how rigorous probabilistic models of something as complicated as a protein's atomic structure can be formulated, focusing on the use of graphical models and directional statistics to model angular degrees of freedom. I will also discuss the reference ratio method, which is needed to "glue" several probabilistic models of protein structure together in a consistent way. The reference ratio method is based on "probability kinematics", a little known method to perform Bayesian inference proposed by the philosopher Richard C. Jeffrey at the end of the fifties. Probability kinematics might find widespread application in statistics and machine learning as a way to formulate complex, high dimensional probabilistic models for multi-scale problems by combining several simpler models.


Anjali Mazumder (University of Warwick)

Probabilistic Graphical Models for planning and reasoning of scientific evidence in the courts

The use of probabilistic graphical models (PGMs) has gained prominence in the forensic science and legal literature when evaluating evidence under uncertainty. The graph-theoretic and modular nature of the PGMs provide a flexible and graphical representation of the inference problem, and propagation algorithms facilitate the calculation of laborious marginal and conditional probabilities of interest. In giving expert testimony regarding, for example, the source of a DNA sample, forensic scientists under much scrutiny, are often asked to justify their decision-making-process. Using information-theoretic concepts and a decision-theoretic framework, we define a value of evidence criterion as a general measure of informativeness for a forensic query and collection of evidence to determine which and how much evidence contributes to the reduction of uncertainty. In this talk, we demonstrate how this approach can be used for a variety of planning problems and the utility of PGMs for scientific and legal reasoning.

 

Mon 12 Oct, '15
-
CRiSM Seminar - Dan Roy (University of Toronto)
A1.01

Dan Roy (University of Toronto)
Nonstandard complete class theorems

For finite parameter spaces under finite loss, there is a close link between optimal frequentist decision procedures and Bayesian procedures:

every Bayesian procedure derived from a prior with full support is admissible, and every admissible procedure is Bayes. This relationship breaks down as we move beyond finite parameter spaces. There is a long line of work relating admissible procedures to Bayesian ones in more general settings. Under some regularity conditions, admissible procedures can be shown to be the limit of Bayesian procedures. Under additional regularity, they are generalized Bayesian, i.e., they minimize the average loss with respect to an improper prior. In both these cases, one must venture beyond the strict confines of Bayesian analysis.

Using methods from mathematical logic and nonstandard analysis, we introduce the notion of a hyperfinite statistical decision problem defined on a hyperfinite probability space and study the class of nonstandard Bayesian decision procedures---namely, those whose average risk with respect to some prior is within an infinitesimal of the optimal Bayes risk. We show that if there is a suitable hyperfinite approximation to a standard statistical decision problem, then every admissible decision procedure is nonstandard Bayes, and so the nonstandard Bayesian procedures form a complete class. We give sufficient regularity conditions on standard statistical decision problems admitting hyperfinite approximations. Joint work with Haosui (Kevin) Duanmu.

Mon 26 Oct, '15
-
CRiSM Seminar - Hernando Ombao (UC Irvine, Dept of Statistics))
A1.01

Hernando Ombao (UC Irvine, Dept of Statistics)
Problems in Non-Stationary Multivariate Time Series With Applications in Brain Signals

We present new tools for analyzing complex multichannel signals using spectral methods. The key challenges are the high dimensionality of brain signals, massive size and the complex nature of the underlying physiological process – in particular, non-stationarity. In this talk, I will highlight some of the current projects. The first is a tool that identifies changes in the structure of a multivariate time series. This is motivated by problems in characterizing changes in brain signals during an epileptic seizure where a localized population of neurons

exhibits abnormal firing behavior which then spreads to other subpopulations of neurons. This abnormal firing behavior is captured by increases in signal amplitudes (which can be easily spotted by visual inspection) and changes in the decomposition of the waveforms and in the strength of dependence between different regions (which are more subtle). The proposed frequency-specific change-point detection method (FreSpeD) uses a cumulative sum test statistic within a binary segmentation algorithm. Theoretical optimal properties of the FreSpeD method will be developed. We demonstrate that, when applied to an epileptic seizure EEG data, FreSpeD identifies the correct brain region as the focal point of seizure, the time of seizure onset and the very subtle changes in cross-coherence immediately preceding seizure onset.

The goal of the second project to track changes in spatial boundaries (or more generally spatial sets or clusters) as the seizure process unfolds. A pair of channels (or a pair of sets of channels) are merged into one cluster if they exhibit synchronicity as measured by, for example, similarities in their spectra or by the strength of their coherence. We will highlight some open problems including developing a model for the evolutionary clustering of non-stationary time series.

The first project is in collaboration with Anna Louise Schröder (London School of Economics); the second is with Carolina Euan (CIMAT, Mexico), Joaquin Ortega (CIMAT, Mexico) and Ying Sun (KAUST, Saudi Arabia).

Thu 12 Nov, '15
-
CRiSM Seminar - Patrick Wolfe (UCL, Dept of Statistics Science))
A1.01

Patrick Wolfe (UCL, Dept of Statistical Science)
Network Analysis and Nonparametric Statistics

Networks are ubiquitous in today's world. Any time we make observations about people, places, or things and the interactions between them, we have a network. Yet a quantitative understanding of real-world networks is in its infancy, and must be based on strong theoretical and methodological foundations. The goal of this talk is to provide some insight into these foundations from the perspective of nonparametric statistics, in particular how trade-offs between model complexity and parsimony can be balanced to yield practical algorithms with provable properties.

Thu 26 Nov, '15
-
CRiSM Seminar - Ismael Castillo (Universite Paris 6, Laboratoire de Probabilites et Modeles Aleatoires
A1.01

Ismael Castillo (Université Paris 6, Laboratoire de Probabilités et Modèles Aléatoires)
On some properties of Polya trees posterior distributions

In Bayesian nonparametrics, Polya tree distributions form a popular and flexible class of priors on distributions or density functions. In the problem of density estimation, for certain choices of parameters, Polya trees have been shown to produce asymptotically consistent posterior distributions in a Hellinger sense. In this talk, after reviewing some general properties of Polya trees, I will show that the previous consistency result can be made much more precise in two directions: 1) rates of convergence can be derived 2) it is possible to characterise the limiting shape of the posterior distribution in a functional sense. We will discuss a few applications to Donsker-type results on the cumulative distribution function and to the study of some functionals of the density.
 

Thu 10 Dec, '15
-
CRiSM Seminar - Martin Lindquist (John Hopkins University, Dept of Biostatistics))
A1.01

Martin Lindquist (John Hopkins University, Dept of Biostatistics)

New Approaches towards High-dimensional Mediation

Mediation analysis is often used in the behavioral sciences to investigate the role of intermediate variables that lie on the path between a randomized treatment and an outcome variable. The influence of the intermediate variable (mediator) on the outcome is often determined using structural equation models (SEMs). While there has been significant research on the topic in recent years, little is known about mediation analysis when the mediator is high dimensional. Here we discuss two approaches towards addressing this problem. The first is an extension of SEMs to the functional data analysis (FDA) setting that allows the mediating variable to be a continuous function rather than a single scalar measure. The second finds the linear combination of a high-dimensional vector of potential mediators that maximizes the likelihood of the SEM. Both methods are applied to data from a functional magnetic resonance imaging (fMRI) study of thermal pain that sought to determine whether brain activation mediated the effect of applied temperature on self-reported pain.

Fri 22 Jan, '16
-
CRiSM Seminar
B1.01

Li Su with Michael J. Daniels (MRC Biostatistics Unit)
Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function
Abstract: In long-term follow-up studies, irregular longitudinal data are observed when individuals are assessed repeatedly over time but at uncommon and irregularly spaced time points. Modeling the covariance structure for this type of data is challenging, as it requires specification of a covariance function that is positive definite. Moreover, in certain settings, careful modeling of the covariance structure for irregular longitudinal data can be crucial in order to ensure no bias arises in the mean structure. Two common settings where this occurs are studies with ‘outcome-dependent follow-up’ and studies with ‘ignorable missing data’. ‘Outcome-dependent follow-up’ occurs when individuals with a history of poor health outcomes had more follow-up measurements, and the intervals between the repeated measurements were shorter. When the follow-up time process only depends on previous outcomes, likelihood-based methods can still provide consistent estimates of the regression parameters, given that both the mean and covariance structures of the irregular longitudinal data are correctly specified and no model for the follow-up time process is required. For ‘ignorable missing data’, the missing data mechanism does not need to be specified, but valid likelihood-based inference requires correct specification of the covariance structure. In both cases, flexible modeling approaches for the covariance structure are essential. In this work*, we develop a flexible approach to modeling the covariance structure for irregular continuous longitudinal data using the partial autocorrelation function and the variance function. In particular, we propose semiparametric non-stationary partial autocorrelation function models, which do not suffer from complex positive definiteness restrictions like the autocorrelation function. We describe a Bayesian approach, discuss computational issues, and apply the proposed methods to CD4 count data from a pediatric AIDS clinical trial.
*Details can be found in the paper published in Statistics in Medicine 2015, 34, 2004–2018.

Fri 5 Feb, '16
-
CRiSM Seminar
B1.01

Ewan Cameron (Oxford, Dept of Zoology)

Progress and (Statistical) Challenges in Malariology

Abstract: In this talk I will describe some key statistical challenges faced by researchers aiming to quantify the burden of disease arising from Plasmodium falciparum malaria at the population level. These include covariate selection in the 'big data' setting, handling spatially-correlated residuals at scale, calibration of individual simulation models of disease transmission, and the embedding of continuous-time, discrete-state Markov Chain solutions within hierarchical Bayesian models. In each case I will describe the pragmatic solutions we've implemented to-date within the Malaria Atlas Project, and highlight more sophisticated solutions we'd like to have in the near-future if the right statistical methodology and computational tools can be identified and/or developed to this end.

References:

http://www.nature.com/nature/journal/v526/n7572/abs/nature15535.html

http://www.nature.com/ncomms/2015/150907/ncomms9170/full/ncomms9170.html

http://www.ncbi.nlm.nih.gov/pubmed/25890035

http://link.springer.com/article/10.1186/s12936-015-0984-9

 

Fri 19 Feb, '16
-
CRiSM Seminar
B1.01

Theresa Smith (CHICAS, Lancaster Medical School)

Modelling geo-located health data using spatio-temporal log-Gaussian Cox processes

Abstract: Health data with high spatial and temporal resolution are becoming more common, but there are several practical and computational challenges to using such data to study the relationships between disease risk and possible predictors. These difficulties include lack of measurements on individual-level covariates/exposures, integrating data measured on difference spatial and temporal units, and computational complexity.

In this talk, I outline strategies for jointly estimating systematic (i.e., parametric) trends in disease risk and assessing residual risk with spatio-temporal log-Gaussian Cox processes (LGCPs). In particular, I will present a Bayesian methods and MCMC tools for using spatio-temporal LGCPs to investigate the roles of environmental and socio-economic risk-factors in the incidence of Campylobacter in England.

 

 

Fri 4 Mar, '16
-
CRiSM Seminar
B1.01

Alan Gelfand (Duke, Dept of Statistical Science)

Title: Space and circular time log Gaussian Cox processes with application to crime event data

Abstract: We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a random intensity which we model as a realization of a spatio-temporal log Gaussian process. In fact, we view time as circular, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type.

Furthermore, each crime event is marked by day of the year which we convert to day of the week. We present models to accommodate such data. Then, we extend the modeling to include the marks. Our specifications naturally take the form of hierarchical models which we t within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. 

Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. Again, we have location, hour, day of the year, and crime type for each event. We investigate a rich range

of models to enhance our understanding of the set of incidences.

Fri 18 Mar, '16
-
CRiSM Seminar
B1.01

Petros Dellaporta (UCL)

Scalable inference for a full multivariate stochastic volatility model

Abstract: We introduce a multivariate stochastic volatility model for asset returns that imposes no restrictions to the structure of the volatility matrix and treats all its elements as functions of latent stochastic processes. When the number of assets is prohibitively large, we propose a factor multivariate stochastic volatility model in which the variances and correlations of the factors evolve stochastically over time. Inference is achieved via a carefully designed feasible andscalable Markov chain Monte Carlo algorithm that combines two computationally important ingredients: it utilizes invariant to the prior Metropolis proposal densities for simultaneously updating all latent paths and has quadratic, rather than cubic, computational complexity when evaluating the multivariate normal densities required. We apply our modelling and computational methodology to 571 stock daily returns of Euro STOXX index for data over a period of 10 years.

Mon 4 Apr, '16 - Fri 8 Apr, '16
All-day
CRiSM Master Class: Non-Parametric Bayes
MS.01

Runs from Monday, April 04 to Friday, April 08.

Fri 6 May, '16
-
CRiSM Seminar
MS.03

Mikhail Malyutov (Northeastern University)

Context-free and Grammer-free Statistical Testing Identity of Styles

Our theory justifies our thorough statistical modification CCC of D.Khmelev's conditional compression based classification idea of 2001 and the 7 years of intensive applied statistical implementation of CCC for authorship attribution of literary works.

Homogeneity testing based on SCOT training with applications to the financial modeling and Statistical Quality Control are also in progress. Both approaches are desrcibed in a Springer monograph which appears shortly. Stochastic Context Tree (abbreviated as SCOT) is m-Markov Chain with every state of a spring independent of the symbols in its more remote past than the context of length determined by the preceding symbols of this state.

In all of our applications we uncover a complex sparse structure of memory in SCOT models that allows excellent discrimination power. In additiion, a straightforward estimation of the stationary distributio of SCOT gives insight into contexts crucial for discrimination between, say, different regimes of financial data or between styles of different authors of literary tests.

Fri 13 May, '16
-
CRiSM Seminar
B1.01

Michael Newton (University of Wisconsin-Madison)

Ranking and selection revisited

In large-scale inference the precision with which individual parameters are estimated may vary greatly among parameters, thus complicating the task to rank order parameters. I present a framework for evaluating different ranking/selection schemes as well as an empirical Bayesian methodology showing theoretical and empirical advantages over available approaches. Examples from genomics and sports will help to illustrate the issues.

Fri 20 May, '16
-
CRiSM Seminar
D1.07

Jon Forster (Southampton)

Model integration for mortality estimation and forecasting

The decennial English Life Tables have been produced after every UK decennial census since 1841. They are based on graduated (smoothed) estimates of central mortality rates, or related functions. For UK mortality, over the majority of the age range, a GAM can provide a smooth function which adheres acceptably well to the crude mortality rates. At the very highest ages, the sparsity of the data mean that the uncertainty about mortality rates is much greater. A further issue is that life table estimation requires us to extrapolate the estimate of the mortality rate function to ages beyond the extremes of the observed data. Our approach integrates a GAM at lower ages with a low-dimensional parametric model at higher ages. Uncertainty about the threshold age ,at which the transition to the simpler model occurs, is integrated into the analysis.

This base structure can then be extended into a model for the evolution of mortality rates over time, allowing the forecasting of mortality rates, a key input into demographic projections necessary for planning.

Fri 3 Jun, '16
-
CRiSM Seminar
D1.07

Degui Li (University of York)

Panel Data Models with Interactive Fixed Effects and Multiple Structural Breaks

In this paper we consider estimation of common structural breaks in panel data models with interactive fixed effects which are unobservable. We introduce a penalized principal component (PPC) estimation procedure with an adaptive group fused LASSO to detect the multiple structural breaks in the models. Under some mild conditions, we show that with probability approaching one the proposed method can correctly determine the unknown number of breaks and consistently estimate the common break dates. furthermore, we estimate the regression coefficients through the post-LASSO method and establish the asymptotic distrbution theory for the resulting estimators. The developed methodology and theory are applicable to the case of dynamic panel data models. The Monte Carlo simulation results demonstrate that the proposed method works well in finite samples with low false detection probability when there is no structural break and high probability of correctly estimating the break numbers when the structural breaks exist. We finally apply our method to study the environmental Kuznets curve for 74 countries over 40 years and detect two breaks in the data.

Fri 10 Jun, '16
-
CRiSM Seminar

Claire Gormley (University College Dublin)

Clustering High Dimensional Mixed Data: Joint Analysis of Phenotypic and Genotypic Data

The LIPGENE-SU.VI.MAX study, like many others, recorded high dimensional continuous phenotypic data and categorical genotypic data. Interest lies in clustering the study participant into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory.

A novel latent variable model which elegantly accommodates high dimensional, mixed data is developed to cluster participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model.

Two clusters or sub-phenotypes (‘healthy’ and ‘at risk’) are uncovered. A small subset of variables is deemed discriminatory which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, seven years after the data were collected, participants underwent further analysis to diagnose presence or absence of the metabolic syndrome (MetS). The two uncovered sub-phenotypes strongly correspond to the seven year follow up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS, and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition.

Fri 17 Jun, '16
-
CRiSM Seminar
D1.07

 

Fri 1 Jul, '16
-
CRiSM Seminar
D1.07

Gonzalo Garcia Donato (Universidad Castilla La Mancha)

Criteria for Bayesian model choice

In model choice (or model selection) several statistical models are postulated as legitimate explanations for a response variable and this uncertainty is to be propagated in the inferential process. The type of questions one is aimed to answer is assorted ranging from e.g. identifying the `true’ model to produce more reliable estimates that takes into account this extra source of variability. Particular important problems of model choice are hypothesis testing, model averaging and variable selection. The Bayesian paradigm provides a conceptually simple and unified solution to the model selection problem: the posterior probabilities of the competing models. This is also named the posterior distribution over the model space and is a simple function of Bayes factors. Answering any question of interest just reduces to summarizing properly this posterior distribution.

Unfortunately, the posterior distribution may depend dramatically on the prior inputs and unlike estimation problems (where model is fixed) such sensitivity does not vanish with large sample sizes. Additionally, it is well known that standard solutions like improper or vague priors cannot be used in general as they result in arbitrary Bayes factors. Bayarri et al (2012) propose tackling these difficulties basing the assignment of prior distributions in objective contexts on a number of sensible statistical. This approach takes a step beyond a way of analyzing the problem that Jeffreys inaugurated fifty years ago.

In this talk the criteria will be presented with emphasis on those aspects who serve to characterize features of the priors that, until today, have been popularly used without a clear justification.

Originally the criteria were accompanied with an application to variable selection in regression models and here we will see how they can be useful to tackle other important scenarios like high dimensional settings or survival problems.

Tue 30 Aug, '16 - Thu 1 Sep, '16
All-day
CRiSM Master Class on Sparse Regression
MS.01

Runs from Tuesday, August 30 to Thursday, September 01.

Fri 14 Oct, '16
-
CRiSM Seminar
A1.01

Daniel Rudolf - Perturbation theory for Markov chains

Perturbation theory for Markov chains addresses the question of how small differences in the transition probabilities of Markov chains are reflected in differences between their distributions. Under a convergence condition we present an estimate of the Wasserstein distance of the nth step distributions between an ideal, unperturbed and an approximating, perturbed Markov chain. We illustrate the result with an example of an autoregressive process.

 

Fri 28 Oct, '16
-
CRiSM Seminar
A1.01

Peter Orbanz

Fri 11 Nov, '16
-
CRiSM Seminar
A1.01

Mingli Chen

Placeholder