Skip to main content Skip to navigation

Events

Select tags to filter on
  More events Jump to any date

Search calendar

Enter a search term into the box below to search for all events matching those terms.

Start typing a search term to generate results.

How do I use this calendar?

You can click on an event to display further information about it.

The toolbar above the calendar has buttons to view different events. Use the left and right arrow icons to view events in the past and future. The button inbetween returns you to today's view. The button to the right of this shows a mini-calendar to let you quickly jump to any date.

The dropdown box on the right allows you to see a different view of the calendar, such as an agenda or a termly view.

If this calendar has tags, you can use the labelled checkboxes at the top of the page to select just the tags you wish to view, and then click "Show selected". The calendar will be redisplayed with just the events related to these tags, making it easier to find what you're looking for.

 
Thu 16 Jan, '14
-
CRiSM Seminar - Chenlei Leng (Warwick), John Fox (Oxford & UCL/Royal Free Hospital)
A1.01

John Fox (Oxford & UCL/Royal Free Hospital)
Arguing logically about risks: strengths, limitations and a request for assistance
Abstract: The standard mathematical treatment of risk combines numerical measures of uncertainty (usually probabilistic) and loss (money and other natural estimators of utility). There are significant practical and theoretical problems with this interpretation. A particular concern is that the estimation of quantitative parameters is frequently problematic, particularly when dealing with one-off events such as political, economic or environmental disasters.
Consequently practical decision-making under risk often requires extensions to the standard treatment.
An intuitive approach to reasoning under uncertainty has recently become established in computer science and cognitive science based on argumentation theory. On this approach theories about an application domain (formalised in a non-classical first-order logic) are applied to propositional facts about specific situations, and arguments are constructed for and/or against claims about what might happen in those situations. Arguments can also attack or support other arguments. Collections of arguments can be aggregated to characterize the type or degree of risk, based on the grounds of the arguments. The grounds and form of an argument can also be used to explain the supporting evidence for competing claims and assess their relative credibility. This approach has led to a novel framework for developing versatile risk management systems and has been validated in a number of domains, including clinical medicine and toxicology (e.g. www.infermed.com; www.lhasa.com). Argumentation frameworks are also being used to support open discussion and debates about important issues (e.g. see debate on "planet under pressure" at http://debategraph.org/Stream.aspx?nid=145319&vt=bubble&dc=focus).

Despite the practical success of argumentation methods in risk management and other kinds of decision making the main theories ignore quantitative measurement of uncertainty, or they combine qualitative reasoning with quantitative uncertainty in ad hoc ways. After a brief introduction to argumentation theory I will demonstrate some medical applications and invite suggestions for ways of incorporating uncertainty probabilistically that are mathematically satisfactory.

Chenlei Leng (Warwick)

High dimensional influence measure


Influence diagnosis is important since presence of influential observations could lead to distorted analysis and misleading interpretations. For high-dimensional data, it is particularly so, as the increased dimensionality and complexity may amplify both the chance of an observation being influential, and its potential impact on the analysis. In this article, we propose a novel high-dimensional influence measure for regressions with the number of predictors far exceeding the sample size. Our proposal can be viewed as a high-dimensional counterpart to the classical Cook's distance. However, whereas the Cook's distance quantifies the individual observation's influence on the least squares regression coefficient estimate, our new diagnosis measure captures the influence on the marginal correlations, which in turn exerts serious influence on downstream analysis including coefficient estimation, variable selection and screening. Moreover, we establish the asymptotic distribution of the proposed influence measure by letting the predictor dimension go to infinity. Availability of this asymptotic distribution leads to a principled rule to determine the critical value for influential observation detection. Both simulations and real data analysis demonstrate usefulness of the new influence diagnosis measure. This is joint work with Junlong Zhao, Lexin Li, and Hansheng Wang.

A copy of the paper is downloadable from http://arxiv.org/abs/1311.6636.

Thu 30 Jan, '14
-
CRiSM Seminar - Judith Rousseau (Paris Dauphine), Jean-Michel Marin (Université Montpellier)
A1.01

Jean-Michel Marin

Consistency of the Adaptive Multiple Importance Sampling (joint work with Pierre Pudlo and Mohammed Sedki

Among Monte Carlo techniques, the importance sampling requires fine tuning of a proposal distribution, which is now fluently resolved through iterative schemes. The Adaptive Multiple Importance Sampling (AMIS) of Cornuet et al. (2012) provides a significant improvement in stability and Effective Sample Size due to the introduction of a recycling procedure. However, the consistency of the AMIS estimator remains largely open. In this work, we prove the convergence of the AMIS, at a cost of a slight modification in the learning process. Numerical experiments exhibit that this modification might even improve the original scheme.

Judith Rousseau

Asymptotic properties of Empirical Bayes procedures – in parametric and non parametric models

 

In this work we investigate frequentist properties of Empirical Bayes procedures. Empirical Bayes procedures are very much used in practice in more or less formalized ways as it is common practice to replace some hyperparameter in the prior by some data dependent quantity. There are typically two ways of constructing these data dependent quantities : using some king of moment estimator or some quantity whose behaviour is well understood or using a maximum marginal likelihood estimator. In this work we first give some general results on how to determine posterior concentration rates under the former setting, which we apply in particular to two types of Dirichlet process mixtures. We then shall discuss more parametric models in the context of maximum marginal likelihood estimation. We will in particular explain why some pathological behaviour can be expected in this case.

Thu 13 Feb, '14
-
CRiSM Seminar - Amanda Turner (Lancaster)
A1.01

Amanda Turner (Lancaster)

Small particle limits in a regularized Laplacian random growth model
--------------------------------------------------------------------------------------------
In 1998 Hastings and Levitov proposed a one-parameter family of models for planar random growth in which clusters are represented as compositions of conformal mappings. This family includes physically occurring processes such as diffusion-limited aggregation (DLA), dielectric breakdown and the Eden model for biological cell growth. In the simplest case of the model (corresponding to the parameter alpha=0), James Norris and I showed how the Brownian web arises in the limit resulting from small particle size and rapid aggregation. In particular this implies that beyond a certain time, all newly aggregating particles share a single common ancestor. I shall show how small changes in alpha result in the emergence of branching structures within the model so that, beyond a certain time, the number of common ancestors is a random number whose distribution can be obtained. This is based on joint work with Fredrik Johansson Viklund (Columbia) and Alan Sola (Cambridge).

 

Thu 13 Feb, '14
-
CRiSM Seminar - Vasileios Maroulas (Bath/Tennessee))
A1.01

Vasileios Maroulas (Bath/Tennessee)

Filtering, drift homotopy and target tracking

Abstract:
Target tracking is a problem of paramount importance arising in Biology, Defense, Ecology and other scientific fields. We attack to this problem by employing particle filtering. Particle filtering is an importance sampling method which may fail in several occasions, e.g. in high dimensional data. In this talk, we present a novel approach for improving particle filters suited to target tracking with a nonlinear observation model. The suggested approach is based on what I
will call drift homotopy for stochastic differential equations which describe the dynamics of the moving target. Based on drift homotopy, we design a Markov Chain Monte Carlo step which is appended to the particle filter and aims to bring the particles closer to the observations while at the same time respecting the dynamics. The talk is based on joint works with Kai Kang and Panos Stinis.

 

Thu 13 Mar, '14
-
CRiSM Seminar - Darren Wilkinson (Newcastle), Richard Everitt (Reading)
A1.01

Darren Wilkinson (Newcastle)
Stochastic Modelling of Genetic Interaction in Budding Yeast

Saccharomyces cerevisiae (often known as budding yeast, or brewers yeast) is a single-celled micro-organism that is easy to grow and genetically manipulate. As it has a cellular organisation that has much in common with the cells of humans, it is often used as a model organism for studying genetics. High-throughput robotic genetic technologies can be used to study the fitness of many thousands of genetic mutant strains of yeast, and the resulting data can be used to identify novel genetic interactions relevant to a target area of biology. The processed data consists of tens of thousands of growth curves with a complex hierarchical structure requiring sophisticated statistical modelling of genetic independence, genetic interaction (epistasis), and variation at multiple levels of the hierarchy. Starting from simple stochastic differential equation

(SDE) modelling of individual growth curves, a Bayesian hierarchical model can be built with variable selection indicators for inferring genetic interaction. The methods will be applied to data from experiments designed to highlight genetic interactions relevant to telomere biology.

Richard Everitt (Reading)

Inexact approximations for doubly and triply intractable problems

Markov random field models are used widely in computer science, statistical physics and spatial statistics and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to an intractable likelihood function. Several methods have been developed that permit exact, or close to exact, simulation from the posterior distribution. However, estimating the marginal likelihood and Bayes' factors for these models remains challenging in general. This talk will describe new methods for estimating Bayes' factors that use simulation to circumvent the evaluation of the intractable likelihood, and compare them to approximate Bayesian computation. We will also discuss more generally the idea of "inexact approximations".

Thu 27 Mar, '14
-
CRiSM Seminar - Professor Adrian Raftery (Washington)
A1.01

Professor Adrian Raftery (Washington)

Bayesian Reconstruction of Past Populations for Developing and Developed Countries

I will describe Bayesian population reconstruction, a new method for estimating past populations by age and sex, with fully probabilistic statements of uncertainty. It simultaneously estimates age-specific
population counts, vital rates and net migration from fragmentary data while formally accounting for measurement error. As inputs, it takes initial bias-corrected estimates of age-specific population counts, vital rates and net migration. The output is a joint posterior probability distribution which yields fully probabilistic interval estimates of past vital rates and population numbers by age and sex. It is designed for the kind of data commonly collected in demographic surveys and censuses and can be applied to countries with widely varying levels of data quality. This is joint work with Mark Wheldon, Patrick Gerland and Samuel Clark.

Thu 1 May, '14
-
Oxford-Warwick Seminar: David Dunson (Duke) and Eric Moulines (Télécom ParisTech)
MS.03

David Dunson (Duke University)

Robust and scalable Bayes via the median posterior

Bayesian methods have great promise in big data sets, but this promise has not been fully realized due to the lack of scalable computational methods. Usual MCMC and SMC algorithms bog down as the size of the data and number of parameters increase. For massive data sets, it has become routine to rely on penalized optimization approaches implemented on distributed computing systems. The most popular scalable approximation algorithms rely on variational Bayes, which lacks theoretical guarantees and badly under-estimates posterior covariance. Another problem with Bayesian inference is the lack of robustness; data contamination and corruption is particularly common in large data applications and cannot easily be dealt with using traditional methods. We propose to solve both the robustness and the scalability problem using a new alternative to exact Bayesian inference we refer to as the median posterior. Data are divided into subsets and stored on different computers prior to analysis. For each subset, we obtain a stochastic approximation to the full data posterior, and run MCMC to generate samples from this approximation. The median posterior is defined as the geometric median of the subset-specific approximations, and can be rapidly approximated. We show several strong theoretical results for the median posterior, including general theorems on concentration rates and robustness. The methods are illustrated through simple examples, including Gaussian process regression with outliers.

Eric Moulines (Télécom ParisTech)

Proximal Metropolis adjusted Langevin algorithm for sampling sparse distribution over high-dimensional spaces

This talk introduces a new Markov Chain Monte Carlo method to sampling sparse distributions or to perform Bayesian model choices in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines (i) a Metropolis adjusted Langevin step to propose local moves associated with the differentiable part of the target density with (ii) a proximal step based on the non-differentiable part of the target density which provides sparse solutions such that small components are shrunk toward zero. Several implementations of the proximal step will be investigated adapted to different sparsity priors or allowing to perform variable selections, in high-dimensional settings. The performance of these new procedures are illustrated on both simulated and real data sets. Preliminary convergence results will also be presented.

Thu 15 May, '14
-
CRiSM Seminar - Mark Fiecas (Warwick)
A1.01

Mark Fiecas (Warwick)
Modeling the Evolution of Neurophysiological Signals

In recent years, research into analyzing brain signals has dramatically increased, and these rich data sets require more advanced statistical tools in order to perform proper statistical analyses. Consider an experiment where a stimulus is presented many times, and after each stimulus presentation (trial), time series data is collected. The time series data per trial exhibit nonstationary characteristics. Moreover, across trials the time series are non-identical because their spectral properties change over the course of the experiment. In this talk, we will look at a novel approach for analyzing nonidentical nonstationary time series data. We consider two sources of nonstationarity: 1) within each trial of the experiment and 2) across the trials, so that the spectral properties of the time series data are evolving over time within a trial, and are also evolving over the course of the experiment. We extend the locally stationary time series model to account for nonidentical data. We analyze a local field potential data set to study how the spectral properties of the local field potentials obtained from the nucleus accumbens and the hippocampus of a monkey evolve over the course of a learning association experiment.

Thu 15 May, '14
-
CRiSM Seminar - David Leslie (Bristol)
A1.01

David Leslie (Bristol)
Applied abstract stochastic approximation

Stochastic approximation was introduced as a tool to find the zeroes of a function under only noisy observations of the function value. A classical statistical example is to find the zeroes of the score function when observations can only be processed sequentially. The method has since been developed and used mainly in the control theory, machine learning and economics literature to analyse iterative learning algorithms, but I contend that it is time for statistics to re-discover the power of stochastic approximation. I will introduce the main ideas of the method, and describe an extension; the parameter of interest is an element of a function space, and we wish to analyse its stochastic evolution through time. This extension allows the analysis of online nonparametric algorithms - we present an analysis of Newton's algorithm to estimate nonparametric mixing distributions. It also allows the investigation of learning in games with a continuous strategy set, where a mixed strategy is an arbitrary distribution on an interval.

(Joint work with Steven Perkins)

Thu 29 May, '14
-
CRiSM Seminar - Rajen Shah (Cambridge)
A1.01

Rajen Shah (Cambridge)

Random Intersection Trees for finding interactions in large, sparse datasets

Many large-scale datasets are characterised by a large number (possibly tens of thousands or millions) of sparse variables. Examples range from medical insurance data to text analysis. While estimating main effects in regression problems involving such data is now a reasonably well-studied problem, finding interactions between variables remains a serious computational challenge. As brute force searches through all possible interactions are infeasible, most approaches build up interaction sets incrementally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose an alternative approach for classification problems with binary predictor variables, called Random Intersection Trees. It works by starting with a maximal interaction that includes all variables, and then gradually removing variables if they fail to appear in randomly chosen observations of a class of interest. We show that with this method, under some weak assumptions, interactions can be found with high probability, and that the computational complexity of our procedure is much smaller than for a brute force search.

Thu 29 May, '14
-
CRiSM Seminar - Randal Douc
A1.01

Randal Douc (TELECOM SudParis)

Identifiability conditions for partially-observed Markov chains By R. Douc, F. Roueff and T. Sim

This paper deals with a parametrized family of partially-observed bivariate Markov chains. We establish that the limit of the normalized log-likelihood is maximized when the parameter belongs to the equivalence class of the true parameter, which is a key feature for obtaining consistency the Maximum Likelihood Estimators (MLE) in well-specified models. This result is obtained in a general framework including both fully dominated or partially dominated models, and thus applies to both Hidden Markov models or Observation-Driven times series. In contrast with previous approaches, the identifiability is addressed by relying on the unicity of the invariant distribution of the Markov chain associated to the complete data, regardless its rate of convergence to the equilibrium. We use this approach to obtain a set of easy-to-check conditions which imply the consistency of the MLE of a general observation driven time series.

Thu 12 Jun, '14
-
CRiSM Seminar - Ben Graham (Warwick)
A1.01

Ben Graham (University of Warwick)

Handwriting, signatures, and convolutions

The 'signature', from the theory of differential equations driven by rough paths, provides a very efficient way of characterizing curves. From a machine learning perspective, the elements of the signature can be used as a set of features for consumption by a classification algorithm.

Using datasets of letters, digits, Indian characters and Chinese characters, we see that this improves the accuracy of online character recognition---that is the task of reading characters represented as a collection of pen strokes.

Thu 12 Jun, '14
-
CRiSM Seminar - Emmanuele Giorgi (Lancaster)

Emmanuele Giorgi (Lancaster)

Combining data from multiple spatially referenced prevalence surveys using generalized linear geostatistical models

Geostatistical methods are becoming more widely used in epidemiology to analyze spatial variation in disease prevalence. These methods are especially useful in resource-poor settings where disease registries are either non-existent or geographically incomplete, and data on prevalence must be obtained by survey sampling of the population of interest. In order to obtain good geographical coverage of the population, it is often necessary also to combine information from multiple prevalence surveys in order to estimate model parameters and for prevalence mapping.

However, simply fitting a single model to the combined data from multiple surveys is inadvisable without testing the implicit assumption that both the underlying process and its realization are common to all of the surveys. We have developed a multivariate generalized linear geostatistical model to combine data from multiple spatially referenced prevalence surveys so as to address each of two common sources of variation across surveys: variation in prevalence over time; variation in data-quality.

In the case of surveys that differ in quality, we assume that at least one of the surveys delivers unbiased gold-standard estimates of prevalence, whilst the others are potentially biased. For example, some surveys might use a random sampling design, the others opportunistic convenience samples. For parameter estimation and spatial predictions, we used Monte Carlo Maximum Likelihood methods.

We describe an application to malaria prevalence data from Chikhwawa District, Malawi. The data consist of two Malaria Indicator Surveys (MISs) and an Easy Access Group (EAG) study, conducted over the period 2010-2012. In the two MISs, the data were collected by random selection of households in an area of 50 villages within 400 square kilometers, whilst the EAG study enrolled a random selection of children attending the vaccination clinic in Chikhwawa District Hospital. The second sampling strategy is more economical, but the sampling bias inherent to such "convenience" samples needs to be taken into account.

Wed 16 Jul, '14
-
CRiSM Seminar - Adelchi Azzalini
A1.01

Adelchi Azzalini (University of Padova)

Clustering based on non-parametric density estimation: A proposal

Cluster analysis based on non-parametric density estimation represents an approach to the clustering problem whose roots date back several decades, but it is only in recent times that this approach could actually be developed. The talk presents one proposal within this approach which is among the few ones which have been brought up to the operational stage.

Wed 8 Oct, '14
-
CRiSM Seminar
MS.03

Christophe Ley - Universite Livre de Bruxelles

Stein's method, Information theory, and Bayesian statistics

In this talk, I will first describe a new general approach to the celebrated Stein method for asymptotic approximations and apply it to diverse approximation problems. Then I will show how Stein’s method can be successfully used in two a priori unrelated domains, namely information theory and Bayesian statistics. In the latter case, I will evaluate the influence of the choice of the prior on the posterior distribution at given sample size n. Based on joint work with Gesine Reinert (Oxford) and Yvik Swan (Liege).

Thu 16 Oct, '14
-
CRiSM Seminar
A1.01

Karthik Bharath

Bayes robustness with Fisher-Rao metric.

A Riemannian-geometric framework is proposed to assess sensitivity of Bayesian procedures to modeling
assumptions based on the nonparametric Fisher–Rao metric; assessments include local and global robustness to perturbations of the likelihood and prior, and identification of influential observations. An important feature of the approach is in the unification of the perturbation and the inference via intrinsic analysis on the space of probability densities under the same Riemannian metric which leads to geometrically calibrated influence measures. Utility of the framework in interesting applications involving generalized mixed-effects and directional data models will be demonstrated.


Tony O’Hagan

"What can we do with a model that's wrong?"

People use models in all fields of science, technology, management, etc. These can range from highly complex mathematical models based on systems of differential equations to relatively simple empirical, statistical, models. This talk is about the uncertainty in the predictions made by models. One aspect of this has come to be called Uncertainty Quantification (UQ), and is concerned with deriving the uncertainty in model outputs induced by uncertainty in the inputs. But there is another component of uncertainty that is much more important: all models are wrong. This talk is about just how badly misled we can be if we forget this fact.

Thu 30 Oct, '14
-
CRiSM Seminar - Pierre Jacob & Leonardo Bottolo
A1.01

Pierre Jacob
Estimation of the score vector and observed information matrix in intractable models
Ionides, King et al. (see Inference for nonlinear dynamical systems, PNAS 103) have introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according its prior distribution. Their methodology bypasses the calculation of any derivative by expressing the score in the original model as an expectation under a modified model. Building upon this insightful work, we provide here a similar "derivative-free" estimator for the observed information matrix, expressed as a covariance matrix under a modified model. In principle the method is applicable for any latent variable model. We also discuss connections with Stein's method and proximal mappings.

Leonardo Bottolo
Mixture priors for subgroups classification
Technologies for the collection of genetic data, such as those based on RNA-Seq, are developing at a very fast rate. A crucial objective in these studies is gene clustering based on the similarity of their level of expression. A more important clustering task concerns samples themselves: this goes under the name of molecular profiling, and aims at identifying similarities across samples based on a few genes identified from the previous stage of the analysis.

In this talk we present a fully Bayesian hierarchical model for molecular profiling. A key ingredient of our approach is represented by the use of mixture distributions. First of all expression measurements are assumed to originate from a three-component mixture distribution, representing the underlying population classes (baseline, underexpression and overexpression) of samples relative to gene-expression. For each gene we assume a specific (random) probability of belonging to any of the three classes. Current Bayesian modelling assumes that the collection of gene-specific probabilities is exchangeable. Exchangeability assumption exhibits some drawbacks, the main one being its inability to capture heterogeneity in gene-behavior. This led us to model the gene-specific probabilities of under overexpression using a mixture of prior distributions with an unknown number of components: this represents the novelty of our approach which is meant to capture variation in gene behaviour and to identify clusters of genes in relation to their probability of under and overexpression.

The model is applied to gene expression levels from RNA-Seq analysis of left ventricular tissue derived from a cohort comprising 33 dilated cardiomyopathy patients with end-stage heart failure who underwent left ventricular assist device implant. Of these patients, 24 subsequently recovered their function whereas 9 did not. There is no obvious distinguishing features between them at the time the sample was taken. Molecular profiling is derived to predict recovery vs non-recovery patients. This is a joint work with Petros Dellaportas and Athanassios Petralias, Dept of Statistics, Athens University.

Thu 13 Nov, '14
-
CRiSM Seminar - Michael Eichler (Maastricht) & Richard Huggins (Melbourne)
A1.01

Michael Eichler (Maastricht)
Causal Inference from Multivariate Time Series: Principles and Problems

In time series analysis, inference about cause-effect relationships among multiple time series is commonly based on the concept of Granger causality, which exploits temporal structure to achieve causal ordering of dependent variables. One major and well known problem in the application of Granger causality for the identification of causal relationships is the possible presence of latent variables that affect the measured components and thus lead to so-called spurious causalities. We present a new graphical approach for describing and analysing Granger-causal relationships in multivariate time series that are possibly affected by latent variables. It is based on mixed graphs in which directed edges represent direct influences among the variables while dashed edges---directed or undirected---indicate associations that are induced by latent variables. We show how such representations can be used for inductive causal learning from time series and discuss the underlying assumptions and their implications for causal learning. Finally we will discuss tetrad constraints in the time series context and how the can be exploited for causal inference.

Richard Huggins (Melbourne)
Semivarying Coefficient Models for Capture--recapture Data: Colony Size Estimation for the Little Penguin
To accommodate seasonal effects that change from year to year into models for the size of an open population we consider a time-varying coefficient model. We fit this model to a capture-recapture data set collected on the little penguin in south-eastern Australia over a 25 year period, using Jolly--Seber type estimators and nonparametric P-spline techniques. The time-varying coefficient model identified strong changes in the seasonal pattern across the years which we further examine using functional data analysis techniques.
(Joint work with Jakub Stoklosa of The University of New South Wales and Peter Dann from the Phillip Island Nature Parks.

Thu 27 Nov, '14
-
CRiSM Seminar - Daniel Williamson (Exeter) & David van Dyk (Imperial)
A1.01

David van Dyk (Imperial)
Statistical Learning Challenges in Astronomy and Solar Physics
In recent years, technological advances have dramatically increased the quality and quantity of data available to astronomers. Newly launched or soon-to-be launched space-based telescopes are tailored to data-collection challenges associated with specific scientific goals. These instruments provide massive new surveys resulting in new catalogs containing terabytes of data, high resolution spectrography and imaging across the electromagnetic spectrum, and incredibly detailed movies of dynamic and explosive processes in the solar atmosphere. The spectrum of new instruments is helping scientists make impressive strides in our understanding of the physical universe, but at the same time generating massive data-analytic and data-mining challenges for scientists who study the resulting data. In this talk I will introduce and discuss the statistical learning challenges inherent in data streams that are both massive and complex.

Daniel Williamson (Exeter)
Earth system models and probabilistic Bayesian calibration: a screw meets a hammer?
The design and analysis of computer experiments, now called “Uncertainty Quantification” or “UQ” has been an active area of statistical research for 25 years. One of the most high profile methodologies, that of calibrating a complex computer code using the Bayesian solution to the inverse problem as described by Kennedy and O’Hagan’s seminal paper in 2001, has become something of a default approach to tackling applications in UQ and has over 1200 citations. However, is this always wise? Though the method is well tested and arguably appropriate for many types of model, particularly those for which large amounts of data are readily available and in which the limitations of the underlying mathematical expressions and solvers are well understood, many models, such as those found in climate simulation, go far beyond those successfully studied in terms of non-linearity, run time, output size and complexity of the underlying mathematics. Have we really solved the calibration problem? To what extent is our “off the shelf approach” appropriate for the problems faced in fields such as Earth system modelling? In this talk we will discuss some of the known limitations of the Bayesian calibration framework (and some perhaps unknown) and we explore the extent to which the conditions in which calibration is known to fail are met in climate model problems. We will then present and argue for an alternative approach to the problem and apply it an ocean GCM known as NEMO.

Tue 2 Dec, '14
-
CRiSM Seminar - David Draper (UC-Santa Cruz), Luis Nieto Barajas (ITAM - Instituto Tecnologico Autonomo de Mexico)
A1.01

Luis Nieto Barajas (ITAM - Instituto Tecnologico Autonomo de Mexico)
A Bayesian nonparametric approach for time series clustering
In this work we propose a model-based clustering method for time series. The model uses an almost surely discrete Bayesian nonparametric prior to induce clustering of the series. Specifically we propose a general Poisson-Dirichlet process mixture model, which includes the Dirichlet process mixture model as particular case. The model accounts for typical features present in a time series like trends, seasonal and temporal components. All or only part of these features can be used for clustering according to the user. Posterior inference is obtained via an easy to implement Markov chain Monte Carlo (MCMC) scheme. The best cluster is chosen according to a heterogeneity measure as well as the model selection criteria LPML (logarithm of the pseudo marginal likelihood). We illustrate our approach with a dataset of time series of shares prices in the Mexican stock exchange.

David Draper (University of California, Santa Cruz, USA, and Ebay Research Labs, San Jose, California, USA)
Why the bootstrap works; and when and why log scores are a good way to compare Bayesian models
In this talk I'll describe recent work on two unrelated topics:
(a) How the frequentist bootstrap may be understood as an approximate Bayesian non-parametric method, which explains why the bootstrap works and when it doesn't, and
(b) Why log scores are a good way to compare Bayesian models, and when they're better than Bayes factors at doing so.

Fri 23 Jan, '15
-
CRiSM Seminar - Rebecca Killick (Lancaster), Peter Green (Bristol)
B1.01 (Maths)

Rebecca Killick (Lancaster)
Forecasting locally stationary time series
Within many fields forecasting is an important statistical tool. Traditional statistical techniques often assume stationarity of the past in order to produce accurate forecasts. For data arising from the energy sector and others, this stationarity assumption is often violated but forecasts still need to be produced. This talk will highlight the potential issues when moving from forecasting stationary to nonstationary data and propose a new estimator, the local partial autocorrelation function, which will aid us in forecasting locally stationary data. We introduce the lpacf alongside associated theory and examples demonstrating its use as a modelling tool. Following this we illustrate the new estimator embedded within a forecasting method and show improved forecasting performance using this new technique.

Peter Green (Bristol)
Inference on decomposable graphs: priors and sampling
The structure in a multivariate distribution is largely captured by the conditional independence relationships that hold among the variables, often represented graphically, and inferring these from data is an important step in understanding a complex stochastic system. We would like to make simultaneous inference about the conditional independence graph and parameters of the model; this is known as joint structural and quantitative learning in the machine learning literature. The Bayesian paradigm allows a principled approach to this simultaneous inference task. There are tremendous computational and interpretational advantages in assuming the conditional independence graph is decomposable, and not too many disadvantages. I will present a new structural Markov property for decomposable graphs, show its consequences for prior modelling, and discuss a new MCMC algorithm for sampling graphs that enables Bayesian structural and quantitative learning on a much bigger scale than previously possible. This is joint work with Alun Thomas (Utah).

Fri 6 Feb, '15
-
CRiSM Seminar - Gareth Peters (UCL), Leonhard Held (University of Zurich)
B1.01 (Maths)

Gareth Peters (UCL)
Sequential Monte Carlo Samplers for capital allocation under copula-dependent risk models
In this talk we assume a multivariate risk model has been developed for a portfolio and its capital derived as a homogeneous risk measure. The Euler (or gradient) principle, then, states that the capital to be allocated to each component of the portfolio has to be calculated as an expectation conditional to a rare event, which can be challenging to evaluate in practice. We exploit the copula-dependence within the portfolio risks to design a Sequential Monte Carlo Samplers based estimate to the marginal conditional expectations involved in the problem, showing its efficiency through a series of computational examples.

Leonard Held (University of Zurich)
Adaptive prior weighting in generalized linear models
The prior distribution is a key ingredient in Bayesian inference. Prior information in generalized linear models may come from different sources and may or may not be in conflict with the observed data. Various methods have been proposed to quantify a potential prior-data conflict, such as Box's $p$-value. However, the literature is sparse on methodology what to do if the prior is not compatible with the observed data. To this end, we review and extend methods to adaptively weight the prior distribution. We relate empirical Bayes estimates of prior weight to Box's p-value and propose alternative fully Bayesian approaches. Prior weighting can be done for the joint prior distribution of the regression coefficients or - under prior independence - separately for each regression coefficient or for pre-specified blocks of regression coefficients. We outline how the proposed methodology can be implemented using integrated nested Laplace approximations (INLA) and illustrate the applicability with a logistic and a log-linear Poisson multiple regression model. This is joint work with Rafael Sauter.

Fri 20 Feb, '15
-
CRiSM Seminar - Marina Knight (York)
B1.01 (Maths)

Marina Knight (York)

Hurst exponent estimation for long-memory processes using wavelet lifting
Reliable estimation of long-range dependence parameters, such as the Hurst exponent, is a well-studied problem in the statistical literature. However, many time series observed in practice present missingness or are naturally irregularly sampled. In these settings, the current literature is sparse, with most approaches requiring heavy modifications in order to deal with the time irregularity. In this talk we present a technique for estimating the Hurst exponent of time series with long memory. The method is based on a flexible wavelet transform built by means of the lifting scheme, and is naturally suitable for series exhibiting time domain irregularity. We shall demonstrate the performance of this new method and illustrate the technique through time series applications in climatology.

Fri 1 May, '15
-
CRiSM Seminar - Marcelo Pereyra (Bristol), Magnus Rattray (Manchester)
D1.07 (Complexity)
Marcelo Pereyra (Bristol)
Proximal Markov chain Monte Carlo: stochastic simulation meets convex optimisation
Convex optimisation and stochastic simulation are two powerful computational methodologies for performing statistical inference in high-dimensional inverse problems. It is widely acknowledged that these methodologies can complement each other very well, yet they are generally studied and used separately. This talk presents a new Langevin Markov chain Monte Carlo method that uses elements of convex analysis and proximal optimisation to simulate efficiently from high-dimensional densities that are log-concave, a class of probability distributions that is widely used in modern high-dimensional statistics and data analysis. The method is based on a new first-order approximation for Langevin diffusions that uses Moreau-Yoshida approximations and proximity mappings to capture the log-concavity of the target density and construct Markov chains with favourable convergence properties. This approximation is closely related to Moreau-Yoshida regularisations for convex functions and uses proximity mappings instead of gradient mappings to approximate the continuous-time process. The proposed method complements existing Langevin algorithms in two ways. First, the method is shown to have very robust stability properties and to converge geometrically for many target densities for which other algorithms are not geometric, or only if the time step is sufficiently small. Second, the method can be applied to high-dimensional target densities that are not continuously differentiable, a class of distributions that is increasingly used in image processing and machine learning and that is beyond the scope of existing Langevin and Hamiltonian Monte Carlo algorithms. The proposed methodology is demonstrated on two challenging models related to image resolution enhancement and low-rank matrix estimation, which are not well addressed by existing MCMC methodology.


Magnus Rattray (Manchester)
Gaussian process modelling for omic time course data
We are developing methods based on Gaussian process inference for analysing data from high-throughput biological time course data. Applications range from classical statistical problems such as clustering and differential expression through to systems biology models of cellular processes such as transcription and it's regulation. Our focus is on developing tractable Bayesian methods which scale to genome-wide applications. I will describe our approach to a number of problems: (1) non-parametric clustering of replicated time course data; (2) inferring the full posterior of the perturbation time point from two-sample time course data; (3) inferring the pre-mRNA elongation rate from RNA polymerase ChIP-Seq time course data; (4) uncovering transcriptional delays by integrating pol-II and RNA time course data through a simple differential equation model.

Fri 15 May, '15
-
CRiSM Seminar - Carlos Carvalho (UT Austin), Andrea Riebler (Norwegian University of Science & Technology)
D1.07 (Complexity)

Carlos Carvalho, (The University of Texas)

Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective
Selecting a subset of variables for linear models remains an active area of research. This article reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.

Andrea Riebler, (Norwegian University of Science and Technology)
Projecting cancer incidence and mortality: Bayesian age-period-cohort models ready for routine use
Projections of age-specific cancer data are of strong interest due to demographical changes, but also advances in medical diagnosis and treatment. Although Bayesian age-period-cohort (APC) models have been shown to be beneficial compared to simpler statistical models in this context, they are not yet used in routine practice. Reasons might be two-fold. First, Bayesian APC models have been criticised for producing too wide credible intervals. Second, there might be a lack of sound and at the same time easy-to-use software. Here, we address both concerns by introducing efficient MCMC-free software and showing that probabilistic forecasts obtained by the Bayesian APC model are well calibrated. We use hitherto annual lung cancer data for females in five different countries and omit the observations from the last 10 years. Consequently, we compare the yearly predictions with the actual observed data based on the absolute error and the continuous ranked probability score. Further, we assess calibration of one-step-ahead predictive distributions.

Fri 29 May, '15
-
CRiSM Seminar - Clifford Lam (LSE), Zoltan Szabo (UCL)
D1.07 (Complexity)

Zoltán Szabó, (UCL)

Regression on Probability Measures: A Simple and Consistent Algorithm

We address the distribution regression problem: we regress from probability measures to Hilbert-space valued outputs, where only samples are available from the input distributions. Many important statistical and machine learning problems can be phrased within this framework including point estimation tasks without analytical solution, or multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees requires density estimation (which often performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the distribution regression problem: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. We prove that this scheme is consistent under mild conditions (for distributions on separable topological domains endowed with kernels), and construct explicit finite sample bounds on the excess risk as a function of the sample numbers and the problem difficulty, which hold with high probability. Specifically, we establish the consistency of set kernels in regression, which was a 15-year-old-open question, and also present new kernels on embedded distributions. The practical efficiency of the studied technique is illustrated in supervised entropy learning and aerosol prediction using multispectral satellite images. [Joint work with Bharath Sriperumbudur, Barnabas Poczos and Arthur Gretton.]

 

Clifford Lam, (LSE)

Nonparametric Eigenvalue-Regularized Precision or COvariance Matrix Estimator for Low and High Frequency Data Analysis

We introduce nonparametric regularization of the eigenvalues of a sample covariance matrix through splitting of the data (NERCOME), and prove that NERCOME enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius norm. One advantage of NERCOME is its computational speed when the dimension is not too large. We prove that NERCOME is positive definite almost surely, as long as the true covariance matrix is so, even when the dimension is larger than the sample size. With respect to the inverse Stein’s loss function, the inverse of our estimator is asymptotically the optimal precision matrix estimator. Asymptotic efficiency loss is defined through comparison with an ideal estimator, which assumed the knowledge of the true covariance matrix. We show that the asymptotic efficiency loss of NERCOME is almost surely 0 with a suitable split location of the data. We also show that all the aforementioned optimality holds for data with a factor structure. Our method avoids the need to first estimate any unknowns from a factor model, and directly gives the covariance or precision matrix estimator. Extension to estimating the integrated volatility matrix for high frequency data is presented as well. Real data analysis and simulation experiments on portfolio allocation are presented for both low and high frequency data.

Fri 12 Jun, '15
-
CRiSM Seminar - Sara van der Geer (Zurich), Daniel Simpson (Warwick)
D1.07 (Complexity)

Daniel Simpson (University of Warwick)

Penalising model component complexity: A principled practical approach to constructing priors

Setting prior distributions on model parameters is the act of characterising the nature of our uncertainty and has proven a critical issue in applied Bayesian statistics. Although the prior distribution should ideally encode the users’ uncertainty about the parameters, this level of knowledge transfer seems to be unattainable in practice and applied statisticians are forced to search for a “default” prior.

Despite the development of objective priors, which are only available explicitly for a small number of highly restricted model classes, the applied statistician has few practical guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference.

In this talk, I will introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user- defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys’ priors, are designed to support Occam’s razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions.

Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations, like random effect models, spline smoothing, disease mapping, Cox proportional hazard models with time-varying frailty, spatial Gaussian fields and multivariate probit models. Further, we show how to control the overall variance arising from many model components in hierarchical models.

This joint work with Håvard Rue, Thiago G. Martins, Andrea Riebler, Geir-Arne Fuglstad (NTNU) and Sigrunn H. Sørbye (Univ. of Tromsø).

Sara van de Geer (ETH Zurich)

Norm-regularized Empirical Risk Minimization

Sara van de Geer Abstract

Fri 26 Jun, '15
-
CRiSM Seminar - Thomas Hamelryck (University of Copenhagan), Anjali Mazumder (Warwick)
D1.07 (Complexity)

Thomas Hamelryck (Bioinformatics Center, University of Copenhagen)

Inference of protein structure and ensembles using Bayesian statistics and probability kinematics

The so-called protein folding problem is the loose designation for an amalgam of closely related, unsolved problems that include protein structure prediction, protein design and the simulation of the protein folding process. We adopt a unique Bayesian approach to modelling bio-molecular structure, based on graphical models, directional statistics and probability kinematics. Notably, we developed a generative probabilistic model of protein structure in full atomic detail. I will give an overview of how rigorous probabilistic models of something as complicated as a protein's atomic structure can be formulated, focusing on the use of graphical models and directional statistics to model angular degrees of freedom. I will also discuss the reference ratio method, which is needed to "glue" several probabilistic models of protein structure together in a consistent way. The reference ratio method is based on "probability kinematics", a little known method to perform Bayesian inference proposed by the philosopher Richard C. Jeffrey at the end of the fifties. Probability kinematics might find widespread application in statistics and machine learning as a way to formulate complex, high dimensional probabilistic models for multi-scale problems by combining several simpler models.


Anjali Mazumder (University of Warwick)

Probabilistic Graphical Models for planning and reasoning of scientific evidence in the courts

The use of probabilistic graphical models (PGMs) has gained prominence in the forensic science and legal literature when evaluating evidence under uncertainty. The graph-theoretic and modular nature of the PGMs provide a flexible and graphical representation of the inference problem, and propagation algorithms facilitate the calculation of laborious marginal and conditional probabilities of interest. In giving expert testimony regarding, for example, the source of a DNA sample, forensic scientists under much scrutiny, are often asked to justify their decision-making-process. Using information-theoretic concepts and a decision-theoretic framework, we define a value of evidence criterion as a general measure of informativeness for a forensic query and collection of evidence to determine which and how much evidence contributes to the reduction of uncertainty. In this talk, we demonstrate how this approach can be used for a variety of planning problems and the utility of PGMs for scientific and legal reasoning.

 

Mon 12 Oct, '15
-
CRiSM Seminar - Dan Roy (University of Toronto)
A1.01

Dan Roy (University of Toronto)
Nonstandard complete class theorems

For finite parameter spaces under finite loss, there is a close link between optimal frequentist decision procedures and Bayesian procedures:

every Bayesian procedure derived from a prior with full support is admissible, and every admissible procedure is Bayes. This relationship breaks down as we move beyond finite parameter spaces. There is a long line of work relating admissible procedures to Bayesian ones in more general settings. Under some regularity conditions, admissible procedures can be shown to be the limit of Bayesian procedures. Under additional regularity, they are generalized Bayesian, i.e., they minimize the average loss with respect to an improper prior. In both these cases, one must venture beyond the strict confines of Bayesian analysis.

Using methods from mathematical logic and nonstandard analysis, we introduce the notion of a hyperfinite statistical decision problem defined on a hyperfinite probability space and study the class of nonstandard Bayesian decision procedures---namely, those whose average risk with respect to some prior is within an infinitesimal of the optimal Bayes risk. We show that if there is a suitable hyperfinite approximation to a standard statistical decision problem, then every admissible decision procedure is nonstandard Bayes, and so the nonstandard Bayesian procedures form a complete class. We give sufficient regularity conditions on standard statistical decision problems admitting hyperfinite approximations. Joint work with Haosui (Kevin) Duanmu.

Mon 26 Oct, '15
-
CRiSM Seminar - Hernando Ombao (UC Irvine, Dept of Statistics))
A1.01

Hernando Ombao (UC Irvine, Dept of Statistics)
Problems in Non-Stationary Multivariate Time Series With Applications in Brain Signals

We present new tools for analyzing complex multichannel signals using spectral methods. The key challenges are the high dimensionality of brain signals, massive size and the complex nature of the underlying physiological process – in particular, non-stationarity. In this talk, I will highlight some of the current projects. The first is a tool that identifies changes in the structure of a multivariate time series. This is motivated by problems in characterizing changes in brain signals during an epileptic seizure where a localized population of neurons

exhibits abnormal firing behavior which then spreads to other subpopulations of neurons. This abnormal firing behavior is captured by increases in signal amplitudes (which can be easily spotted by visual inspection) and changes in the decomposition of the waveforms and in the strength of dependence between different regions (which are more subtle). The proposed frequency-specific change-point detection method (FreSpeD) uses a cumulative sum test statistic within a binary segmentation algorithm. Theoretical optimal properties of the FreSpeD method will be developed. We demonstrate that, when applied to an epileptic seizure EEG data, FreSpeD identifies the correct brain region as the focal point of seizure, the time of seizure onset and the very subtle changes in cross-coherence immediately preceding seizure onset.

The goal of the second project to track changes in spatial boundaries (or more generally spatial sets or clusters) as the seizure process unfolds. A pair of channels (or a pair of sets of channels) are merged into one cluster if they exhibit synchronicity as measured by, for example, similarities in their spectra or by the strength of their coherence. We will highlight some open problems including developing a model for the evolutionary clustering of non-stationary time series.

The first project is in collaboration with Anna Louise Schröder (London School of Economics); the second is with Carolina Euan (CIMAT, Mexico), Joaquin Ortega (CIMAT, Mexico) and Ying Sun (KAUST, Saudi Arabia).

Placeholder