Skip to main content Skip to navigation

Events

Select tags to filter on
  More events Jump to any date

Search calendar

Enter a search term into the box below to search for all events matching those terms.

Start typing a search term to generate results.

How do I use this calendar?

You can click on an event to display further information about it.

The toolbar above the calendar has buttons to view different events. Use the left and right arrow icons to view events in the past and future. The button inbetween returns you to today's view. The button to the right of this shows a mini-calendar to let you quickly jump to any date.

The dropdown box on the right allows you to see a different view of the calendar, such as an agenda or a termly view.

If this calendar has tags, you can use the labelled checkboxes at the top of the page to select just the tags you wish to view, and then click "Show selected". The calendar will be redisplayed with just the events related to these tags, making it easier to find what you're looking for.

 
Fri 20 Jan, '17
-
CRiSM Seminar
MA_B1.01

Yi Yu (University of Bristol)

Title: Estimating whole brain dynamics using spectral clustering

Abstract: The estimation of time-varying networks for functional Magnetic Resonance Imaging (fMRI) data sets is of increasing importance and interest. In this work, we formulate the problem in a high-dimensional time series framework and introduce a data-driven method, namely Network Change Points Detection (NCPD), which detects change points in the network structure of a multivariate time series, with each component of the time series represented by a node in the network. NCPD is applied to various simulated data and a resting-state fMRI data set. This new methodology also allows us to identify common functional states within and across subjects. Finally, NCPD promises to offer a deep insight into the large-scale characterisations and dynamics of the brain. This is joint work with Ivor Cribben (Alberta School of Business).

Fri 3 Feb, '17
-
CRiSM Seminar
MA_B1.01

Liz Ryan (KCL)

Title: Simulation-based Fully Bayesian Experimental Design

Abstract:

Bayesian experimental design is a fast growing area of research with many real-world applications. As computational power has increased over the years, so has the development of simulation-based design methods, which involve a number of Bayesian algorithms, such as Markov chain Monte Carlo (MCMC) algorithms. However, many of the proposed algorithms have been found to be computationally intensive for complex or nonstandard design problems, such as those which require a large number of design points to be found and/or those for which the observed data likelihood has no analytic expression. In this work, we develop novel extensions of existing algorithms which have been used for Bayesian experimental design, and also incorporate methodologies which have been used for Bayesian inference into the design framework, so that solutions to more complex design problems can be found.

Fri 17 Feb, '17
-
CRiSM Seminar
MA_B1.01

Ioannis Kosmidis

Title:
Reduced-bias inference for regression models with tractable and
intractable likelihoods

Abstract:

This talk focuses on a unified theoretical and algorithmic framework
for reducing bias in the estimation of statistical models from a
practitioners point of view. We will briefly discuss how shortcomings
of classical estimators and of inferential procedures depending on
those can be overcome via reduction of bias, and provide a few
demonstrations stemming from current and past research on well-used
statistical models with tractable likelihoods, including beta
regression for bounded-domain responses, and the typically
small-sample setting of meta-analysis and meta-regression in the
presence of heterogeneity. The large impact that bias in the
estimation of the variance components can have on inference motivates
delivering higher-order corrective methods for generalised linear
mixed models. The challenges in doing that will be presented along
with resolutions stemming from current research.
Fri 3 Mar, '17
-
CRiSM Seminar
MA_B1.01

Marcelo Pereyra

Bayesian inference by convex optimisation: theory, methods, and algorithms.

Abstract:

Convex optimisation has become the main Bayesian computation methodology in many areas of data science such as mathematical imaging and machine learning, where high dimensionality is often addressed by using models that are log-concave and where maximum-a-posteriori (MAP) estimation can be performed efficiently by optimisation. The first part of this talk presents a new decision-theoretic derivation of MAP estimation and shows that, contrary to common belief, under log-concavity MAP estimators are proper Bayesian estimators. A main novelty is that the derivation is based on differential geometry. Following on from this, we establish universal theoretical guarantees for the estimation error involved and show estimation stability in high dimensions. Moreover, the second part of the talk describes a new general methodology for approximating Bayesian high-posterior-density regions in log-concave models. The approximations are derived by using recent concentration of measure results related to information theory, and can be computed very efficiently, even in large-scale problems, by using convex optimisation techniques. The approximations also have favourable theoretical properties, namely they outer-bound the true high-posterior-density credibility regions, and they are stable with respect to model dimension. The proposed methodology is finally illustrated on two high-dimensional imaging inverse problems related to tomographic reconstruction and sparse deconvolution, where they are used to explore the uncertainty about the solutions, and where convex-optimisation-empowered proximal Markov chain Monte Carlo algorithms are used as benchmark to compute exact credible regions and measure the approximation error.

Fri 17 Mar, '17
-
CRiSM Seminar
MA_B1.01

Paul Birrell (MRC Biostatistics Unit, Cambridge)

Towards Computationally Efficient Epidemic Inference

In a pandemic where infection is widespread, there is no direct observation of the infection processes. Instead information comes from a variety of surveillance data schemes that are prone to noise, contamination, bias and sparse sampling. To form an accurate impression of the epidemic and to be able to make forecasts of its evolution, therefore, as many of these data streams as possible need to be assimilated into a single integrated analysis. The result of this is that the transmission model describing the infection process and the linked observation models can become computationally demanding, limiting the capacity for statistical inference in real-time.

I will discuss some of our attempts at making the inferential process more efficient, with particular focus on dynamic emulation, where the computationally expensive epidemic model is replaced by a more readily evaluated proxy, a time-evolving Gaussian process trained on a (relatively) small number of model runs at key input values, training that can be done a priori.

 

Fri 5 May, '17
-
CRiSM Seminar
D1.07

"Adaptive MCMC For Everyone"

Jeffrey Rosenthal, University of Toronto

Markov chain Monte Carlo (MCMC) algorithms, such as the Metropolis
Algorithm and the Gibbs Sampler, are an extremely useful
and popular method of approximately sampling from complicated
probability distributions. Adaptive MCMC attempts to automatically
modify the algorithm while it runs, to improve its performance on
the fly. However, such adaptation often destroys the ergodicity
properties necessary for the algorithm to be valid. In this talk,
we first illustrate MCMC algorithms using simple graphical Java
applets. We then discuss adaptive MCMC, and present examples and
theorems concerning its ergodicity and efficiency. We close with
some recent ideas which make adaptive MCMC more widely applicable
in broader contexts.

Fri 19 May, '17
-
CRiSM Seminar
D1.07

Korbinian Strimmer (Imperial)

An entropy approach for integrative genomics and network modeling

Multivariate regression approaches such as Seemingly Unrelated Regression (SUR) or Partial Least Squares (PLS) are commonly used in vertical data integration to jointly analyse different types of omics data measured on the same samples, such as SNP and gene expression data (eQTL) or proteomic and transcriptomic data.  However, these approaches may be difficult to apply and to interpret for computational and conceptual reasons.

Here we present a simple alternative approach to integrative genomics based on using relative entropy to characterise the overall association between two (or more) sets of omic data, and to infer the underlying corresponding association network among the individual covariates.  This approach is computationally inexpensive and can be applied to large-dimensional data sets.  A key and novel feature of our method is decomposition of the total strength between two or more groups of variables based on optimal whitening of the individual data sets.  Correspondingly, it may also be viewed as a special form of a latent-variable multivariate regression model.

We illustrate this approach by analysing metabolomic and transcriptomic data from the DILGOM study.


References:

A. Kessy, A. Lewin, and K. Strimmer. 2017. Optimal whitening and decorrelation. The American Statistician, to appear. http://dx.doi.org/10.1080/00031305.2016.1277159
T. Jendoubi and K. Strimmer. 2017. Data integration and network modeling: an entropy approach.  In prep.
Fri 30 Jun, '17
-
CRiSM Seminar - Paul Kirk (BSU, Cambridge) (C1.06)
C1.06, Zeeman Building

Title: Semi-supervised multiview clustering for high-dimensional data

Abstract: Although the challenges presented by high dimensional data in the context of regression are well-known and the subject of much current research, comparatively little work has been done on this in the context of clustering. In this setting, the key challenge is that often only a small subset of the covariates provides a relevant stratification of the population. Identifying relevant strata can be particularly challenging when dealing with high-dimensional datasets, in which there may be many covariates that provide no information whatsoever about population structure, or – perhaps worse – in which there may be (potentially large) covariate subsets that define irrelevant stratifications. For example, when dealing with genetic data, there may be some genetic variants that allow us to group patients in terms of disease risk, but others that would provide completely irrelevant stratifications (e.g. which would group patients together on the basis of eye or hair colour). Bayesian profile regression is a semi-supervised model-based clustering approach that makes use of a response in order to guide the clustering toward relevant stratifications. Here we consider how this approach can be extended to the "multiview" setting, in which different groups of covariates ("views") define different stratifications. We also present a heuristic alternative, some preliminary results in the context of breast cancer subtyping, and consider how the approach could also be used to integrate different 'omics datasets (assuming that each dataset provides measurements on a common set of individuals).

Fri 27 Oct, '17
-
CRiSM Seminar
A1.01

Speaker: Davide Pigoli (King's College London)

Title: Functional data analysis of biological growth processes

 Abstract: Functional data are examples of high-dimensional data when the observed variables have a natural ordering and are generated by an underlying smooth process. These additional properties allow us to develop methods that go beyond what would be possible with classical multivariate techniques. In this talk, I will demonstrate the potential of functional data analysis for biological growth processes in two different applications. The first one is in forensic entomology, where there is the need of estimating time-dependent growth curves from experiments where larvae have been exposed to a relatively small number of constant temperature profiles. The second one is in quantitative genetics, where the growth curve is a function-valued phenotypic trait from which the continuous genetic variation needs to be estimated.

Thu 9 Nov, '17
-
CRiSM Seminar
C0.08

Speaker: Jonathan Keith (Monash University)

Title: Markov chain Monte Carlo in discrete spaces, with applications in bioinformatics and ecology

 Abstract: Efficient sampling of probability distributions over large discrete spaces is a challenging problem that arises in many contexts in bioinformatics and ecology. For example, segmentation of genomes to identify putative functional elements can be cast as a multiple change-point problem involving thousands or even millions of change-points. Another example involves reconstructing the invasion history of an introduced species by embedding a phylogenetic tree in a landscape. A third example involves inferring networks of molecular interactions in cellular systems.

 In this talk I describe a generalisation of the Gibbs sampler that allows this well known strategy for sampling probability distributions in R^n to be adapted for sampling discrete spaces. The technique has been successfully applied to each of the problems mentioned above. However, these problems remain highly computationally intensive. I will discuss a number of alternatives for efficient sampling of such spaces, and will be seeking collaborations to develop these and other new approaches.

Fri 24 Nov, '17
-
CRiSM Seminar
A1.01
3-4pm A1.01, Nov 24, 2017 - Song Liu

Title: Trimmed Density Ratio Estimation

Abstract: Density ratio estimation has become a versatile tool in machine learning community recently. However, due to its unbounded nature, density ratio estimation is vulnerable to corrupted data points, which often pushes the estimated ratio toward infinity. In this paper, we present a robust estimator which automatically identifies and trims outliers. The proposed estimator has a convex formulation, and the global optimum can be obtained via subgradient descent. We analyze the parameter estimation error of this estimator under high-dimensional settings. Experiments are conducted to verify the effectiveness of the estimator.

Fri 8 Dec, '17
-
CRiSM Seminar
A1.01
3-4pm A1.01, Dec 8, 2017 - Richard Samworth

Title: High-dimensional changepoint estimation via sparse projection

Abstract: Changepoints are a very common feature of big data that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate changepoint estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

4-5pm A1.01, Dec 8, 2017 - Simon R. White, MRC Biostatistics Unit, University of Cambridge

Title: Spatio-temporal modelling and heterogeneity in neuroimaging

 Abstract: Neuroimaging allows us to gain insight into the structure and activity of the brain. Clearly, there is significant spatial structure that leads to dependencies across measurements that must be accounted for. Further, the brain as an organ is never idle, thus the local temporal behaviour is important when characterising long-term functional connectivity.

 In this talk we will discuss several approaches to modelling neuroimaging that account for these key features, namely spatio-temporal heterogeneity: a novel approach to spatial modelling as an extension to the commonly used dimension reduction technique independent component analysis (ICA) for tasked-based functional magnetic resonance imaging (fMRI); propagating subject-level heterogeneity through multi-stage analyses of dynamic functional connectivity (dFC) using resting-state fMRI (rs-fMRI), and structural development using structural MRI.

Fri 19 Jan, '18
-
CRiSM Seminar
MA_B1.01

Jonas Peters, Department of Mathematical Sciences, University of Copenhagen

Invariant Causal Prediction

Abstract: Why are we interested in the causal structure of a process? In classical prediction tasks as regression, for example, it seems that no causal knowledge is required. In many situations, however, we want to understand how a system reacts under interventions, e.g., in gene knock-out experiments. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction uses only direct causes of the target variable as predictors; it remains valid even if we intervene on predictor variables or change the whole experimental setting. In this talk, we show how we can exploit this invariance principle to estimate causal structure from data. We apply the methodology to data sets from biology, epidemiology, and finance. The talk does not require any knowledge about causal concepts.

David Ginsbourger, Idiap Research Institute and University of Bern, http://www.ginsbourger.ch
Quantifying and reducing uncertainties on sets under Gaussian Process priors

Abstract: Gaussian Process models have been used in a number of problems where an objective function f needs to be studied based on a drastically limited number of evaluations.

 

Global optimization algorithms based on Gaussian Process models have been investigated for several decades, and have become quite popular notably in design of computer experiments. Also, further classes of problems involving the estimation of sets implicitly defined by f, e.g. sets of excursion above a given threshold, have inspired multiple research developments.

 

In this talk, we will give an overview of recent results and challenges pertaining to the estimation of sets under Gaussian Process priors, with a particular interest for to the quantification and the sequential reduction of associated uncertainties.

 

Based on a series of joint works primarily with Dario Azzimonti, François Bachoc, Julien Bect, Mickaël Binois, Clément Chevalier, Ilya Molchanov, Victor Picheny, Yann Richet and Emmanuel Vazquez.

Fri 19 Jan, '18
-
CRiSM Seminar
A1.01
Fri 2 Feb, '18
-
CRiSM Seminar
MA_B1.01
2-3pm MA B1.01, 2 Feb, 2018 - Robin Evans - (Oxford University)

Title: Geometry and statistical model selection Abstract: TBA

Fri 2 Feb, '18
-
CRiSM Seminar
A1.01

2nd Feb - 3pm - 4pm A1.01 - Azadeh Khaleghi (Lancaster Univeristy)

Title: Approximations of the Restless Bandit Problem

Abstract: In this talk I will discuss our recent paper on the multi-armed restless bandit problem. My focus will be on an instance of the bandit problem where the pay-off distributions are stationary $\phi$-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice since it is known to be PSPACE-hard. The objective is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. I show that under some conditions on the $\phi$-mixing coefficients, a modified version of the UCB algorithm proves effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms.

In particular, the $\phi$-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis. I will ensure to make the talk accessible to non-experts.

Fri 4 May, '18
-
CRiSM Seminar
B3.02
Wenyang Zhang - (University of York)

Homogeneity Pursuit in Single Index Models based Panel Data Analysis

Panel data analysis is an important topic in statistics and econometrics.
Traditionally, in panel data analysis, all individuals are assumed to share the
same unknown parameters, e.g. the same coefficients of covariates when the
linear models are used, and the differences between the individuals are
accounted for by cluster effects. This kind of modelling only makes sense if
our main interest is on the global trend, this is because it would not be able
to tell us anything about the individual attributes which are sometimes very
important. In this talk, I will present a new modelling approach, based on the
single index models embedded with homogeneity, for panel data analysis, which
builds the individual attributes in the model and is parsimonious at the same
time. I will show a data driven approach to identify the structure of
homogeneity, and estimate the unknown parameters and functions based on the
identified structure. I will show the asymptotic properties of the resulting
estimators. I will also use intensive simulation studies to show how well
the resulting estimators work when sample size is finite. Finally, I will
apply the proposed modelling idea to a public financial dataset and a UK
climate dataset, and show some interesting findings.

 

 

Fri 18 May, '18
-
CRiSM Seminar
B3.02
Sergio Bacallado (University of Cambridge)

Three stories on clinical trial design

The design of randomised clinical trials is one of the most classical applications of modern Statistics. The first part of this talk has to do with adaptive trial designs, which aim to minimise the harm to study participants by biasing randomisation toward arms that are performing well, or by closing experimental arms when there is early evidence of futility. We first propose a class of Bayesian uncertainty-directed trial designs, which aim to maximise information gain at the trial's conclusion, and we show in applications to various types of trial that it has superior operating characteristics when compared to simpler adaptive policies. In a second section, I will discuss the use of reinforcement learning algorithms to approximate Bayes-optimal policies given a prior for the treatment effects and a utility function combining outcomes for participants and the uncertainty of treatment effects. The last part of the talk will consider the possibility of sharing preliminary data from trials with patients and physicians who are making enrollment decisions. This practice may be in line with a trend toward patient-centred clinical research, but it presents many challenges and potential pitfalls. Through a simulation study, modelled on the landscape of Glioblastoma trials in the last 15 years, we explore how such 'permeable' designs could affect operating characteristics and the statistical validity of trial conclusions.

Joint work with Lorenzo Trippa, Steffen Ventz, and Brian Alexander

 

Fri 18 May, '18
-
CRiSM Seminar
A1.01

Caitlin Buck (University of Sheffield)


A dilemma in Bayesian chronology construction


Chronology construction was one of the first applications used to show case the value of MCMC methods for Bayesian inference (Naylor and Smith, 1988; Buck et al, 1992). As a result, Bayesian chronology construction is now ubiquitous in archaeology and is becoming increasingly popular in palaeoenvironmental research. Currently available software requires users to construct the statistical models and input prior knowledge by hand, requiring considerable expertise and patience. As a result, the published chronologies for most sites are based on a single model which is assumed to be correct. Recent research has, however, led to a proposal to automate production of Bayesian chronological models from field records. The approach uses directed acyclic graphs (DAGs) to represent the site stratigraphy and, from these, construct priors for the Bayesian hierarchical models (Dye and Buck, 2015). The related software is in the developmental stage but, before it can be released, we need to decide what advice to offer users about working with the large number of potential models that the new software will construct. In this seminar I will outline how and why Bayesian methods are so widely used in chronology construction, show case the new DAG-based approach, explain the nature of the dilemma we face and hope to start a discussion about potential practical solutions.

C.E. Buck, C.D. Litton, & A.F.M. Smith (1992) Calibration of radiocarbon results pertaining to related archaeological events, Journal of Archaeological Science, Vol. 19, Iss. 5, pp 497-512.

T. S. Dye & C.E. Buck (2015) Archaeological sequence diagrams and Bayesian chronological models, Journal of Archaeological Science, Vol. 63, pp 84-93.

J. C. Naylor & A. F. M. Smith (1988) An Archaeological Inference Problem, Journal of the American Statistical Association, Vol. 83, Iss. 403, pp 588-595.


 


 


 


 


 

 

 

 

Fri 1 Jun, '18
-
CRiSM Seminar
B3.02

Victor Panaretos (EPFL)

 

What is the dimension of a stochastic process?

 

How can we determine whether a mean-square continuous stochastic process is, in fact, finite-dimensional, and if so, what its actual dimension is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional data analysis, which require low-dimensional representations whether by functional PCA or other methods. The difficulty is that the determination is to be made on the basis of iid replications of the process observed discretely and with measurement error contamination. This adds a ridge to the empirical covariance, obfuscating the underlying dimension. We build a matrix-completion-inspired test procedure that circumvents this issue by measuring the best possible least square fit of the empirical covariance's off-diagonal elements, optimised over covariances of given finite rank. For a fixed grid of sufficient size, we determine the statistic's asymptotic null distribution as the number of replications grows. We then use it to construct a bootstrap implementation of a stepwise testing procedure controlling the family-wise error rate corresponding to the collection of hypothesis formalising the question at hand. The procedure involves no tuning parameters or pre-smoothing, is indifferent to the homoskedasticity or lack of it in the measurement errors, and does not assume a low-noise regime. Based on joint work with Anirvan Chakraborty (EPFL).

 

Fri 15 Jun, '18
-
CRiSM Seminar
B3.02
2-3pm B3.02, June 15, 2018 - Sarah Heaps (Newcastle University)

Identifying the effect of public holidays on daily demand for gas

Gas distribution networks need to ensure the supply and demand for gas are balanced at all times. In practice, this is supported by a number of forecasting exercises which, if performed accurately, can substantially lower operational costs, for example through more informed preparation for severe winters. Amongst domestic and commercial customers, the demand for gas is strongly related to the weather and patterns of life and work. In regard to the latter, public holidays have a pronounced effect, which often extends into neighbouring days. In the literature, the days over which this protracted effect is felt are typically pre-specified as fixed windows around each public holiday. This approach fails to allow for any uncertainty surrounding the existence, duration and location of the protracted holiday effects. We introduce a novel model for daily gas demand which does not fix the days on which the proximity effect is felt. Our approach is based on a four-state, non-homogeneous hidden Markov model with cyclic dynamics. In this model the classification of days as public holidays is observed, but the assignment of days as “pre-holiday”, “post-holiday” or “normal” is unknown. Explanatory variables recording the number of days to the preceding and succeeding public holidays guide the evolution of the hidden states and allow smooth transitions between normal and holiday periods. To allow for temporal autocorrelation, we model the logarithm of gas demand at multiple locations, conditional on the states, using a first-order vector autoregression (VAR(1)). We take a Bayesian approach to inference and consider briefly the problem of specifying a prior distribution for the autoregressive coefficient matrix of a VAR(1) process which is constrained to lie in the stationary region. We summarise the results of an application to data from Northern Gas Networks (NGN), the regional network serving the North of England, a preliminary version of which is already being used by NGN in its annual medium-term forecasting exercise.

--

Thu 25 Oct, '18
-
CRiSM Seminar
A1.01

Speaker: Professor Martyn Plummer, Department of Statistics, Warwick University
Title: A Bayesian Information Criterion for Singular Models

Abstract: We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other competing sub-models. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz’s Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist large-sample behavior of their marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems.

Thu 8 Nov, '18
-
CRiSM Seminar
A1.01

 Dr. Martin Tegner, University of Oxford

A probabilistic
approach to non-parametric local volatility

 The local volatility model is a celebrated model widely used for pricing and hedging financial derivatives. While the model’s main appeal is its capability of reproducing any given surface of observed option prices—it provides a perfect fit—the essential component of the model is a latent function which can only be unambiguously determined in the limit of infinite data. To (re)construct this function, numerous calibration methods have been suggested involving steps of interpolation and extrapolation, most often of parametric form and with point-estimates as result. We seek to look at the calibration problem in a probabilistic framework with a fully nonparametric approach based on Gaussian process priors. This immediately gives a way of encoding prior believes about the local volatility function and a hypothesis model which is highly flexible whilst being prone to overfitting. Besides providing a method for calibrating a (range of) point-estimate(s), we seek to draw posterior inference on the distribution over local volatility. This to better understand the uncertainty attached with the calibration in particular, and with the model in general. Further, we seek to understand dynamical properties of local volatility by augmenting the hypothesis space with a time dimension. Ideally, this gives us means of inferring predictive distributions not only locally, but also for entire surfaces forward in time.

 

--------------------------

Tue 20 Nov, '18
-
CRiSM Seminar
A1.01

Dr. Kayvan Sadeghi, University College London

Probabilistic Independence, Graphs, and Random Networks
The main purpose of this talk is to explore the relationship between the set of conditional independence statements induced by a probability distribution and the set of separations induced by graphs as studied in graphical models. I introduce the concepts of Markov property and faithfulness, and provide conditions under which a given probability distribution is Markov or faithful to a graph in a general setting. I discuss the implications of these conditions in devising structural learning algorithms, in understanding exchangeabile vectors, and in random network analysis.

Thu 6 Dec, '18
-
CRiSM Seminar
A1.01

Dr. Carlo Albert, EAWAG, Switzerland

Bayesian Inference for Stochastic Differential Equation Models through Hamiltonian Scale Separation

Bayesian parameter inference is a fundamental problem in model-based data science. Given observed data, which is believed to be a realization of some parameterized model, the aim is to find a distribution of likely parameter values that are able to explain the observed data. This so-called posterior distribution expresses the probability of a given parameter to be the "true" one, and can be used for making probabilistic predictions. For truly stochastic models this posterior distribution is typically extremely expensive to evaluate. We propose a novel approach for generating posterior parameter distributions, for stochastic differential equation models calibrated to measured time-series. The algorithm is inspired by re-interpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, whose dynamics is confined by both the model and the measurements. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for 1D problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.

Thu 17 Jan, '19
-
CRiSM Seminar
MSB2.23

Prof. Galin Jones, School of Statistics, University of Minnesota (14:00-15:00)

Bayesian Spatiotemporal Modeling Using Hierarchical Spatial Priors, with Applications to Functional Magnetic Resonance Imaging

We propose a spatiotemporal Bayesian variable selection model for detecting activation in functional magnetic resonance imaging (fMRI) settings. Following recent research in this area, we use binary indicator variables for classifying active voxels. We assume that the spatial dependence in the images can be accommodated by applying an areal model to parcels of voxels. The use of parcellation and a spatial hierarchical prior (instead of the popular Ising prior) results in a posterior distribution amenable to exploration with an efficient Markov chain Monte Carlo (MCMC) algorithm. We study the properties of our approach by applying it to simulated data and an fMRI data set.

Dr. Flavio Goncalves, Universidade Federal de Minas Gerais, Brazil (15:00-16:00).

Exact Bayesian inference in spatiotemporal Cox processes driven by multivariate Gaussian processes

In this talk we present a novel inference methodology to perform Bayesian inference for spatiotemporal Cox processes where the intensity function depends on a multivariate Gaussian process. Dynamic Gaussian processes are introduced to allow for evolution of the intensity function over discrete time. The novelty of the method lies on the fact that no discretisation error is involved despite the non-tractability of the likelihood function and infinite dimensionality of the problem. The method is based on a Markov chain Monte Carlo algorithm that samples from the joint posterior distribution of the parameters and latent variables of the model. The models are defined in a general and flexible way but they are amenable to direct sampling from the relevant distributions, due to careful characterisation of its components. The models also allow for the inclusion of regression covariates and/or temporal components to explain the variability of the intensity function. These components may be subject to relevant interaction with space and/or time. Real and simulated examples illustrate the methodology, followed by concluding remarks.

Thu 31 Jan, '19
-
CRiSM Seminar
MSB2.23

Professor Paul Fearnhead, Lancaster University - 14:00-1500

Efficient Approaches to Changepoint Problems with Dependence Across Segments

Changepoint detection is an increasingly important problem across a range of applications. It is most commonly encountered when analysing time-series data, where changepoints correspond to points in time where some feature of the data, for example its mean, changes abruptly. Often there are important computational constraints when analysing such data, with the number of data sequences and their lengths meaning that only very efficient methods for detecting changepoints are practically feasible.

A natural way of estimating the number and location of changepoints is to minimise a cost that trades-off a measure of fit to the data with the number of changepoints fitted. There are now some efficient algorithms that can exactly solve the resulting optimisation problem, but they are only applicable in situations where there is no dependence of the mean of the data across segments. Using such methods can lead to a loss of statistical efficiency in situations where e.g. it is known that the change in mean must be positive.

This talk will present a new class of efficient algorithms that can exactly minimise our cost whilst imposing certain constraints on the relationship of the mean before and after a change. These algorithms have links to recursions that are seen for discrete-state hidden Markov Models, and within sequential Monte Carlo. We demonstrate the usefulness of these algorithms on problems such as detecting spikes in calcium imaging data. Our algorithm can analyse data of length 100,000 in less than a second, and has been used by the Allen Brain Institute to analyse the spike patterns of over 60,000 neurons.

(This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and Daniela Witten.)

Dr. Sandipan Roy, Department of Mathematical Science, University of Bath (15:00-16:00)

Network Heterogeneity and Strength of Connections

Abstract: Detecting strength of connection in a network is a fundamental problem in understanding the relationship among individuals. Often it is more important to understand how strongly the two individuals are connected rather than the mere presence/absence of the edge. This paper introduces a new concept of strength of connection in a network through a nonparameteric object called “Grafield”. “Grafield” is a piece-wise constant bi-variate kernel function that compactly represents the affinity or strength of ties (or interactions) between every pair of vertices in the graph. We estimate the “Grafield” function through a spectral analysis of the Laplacian matrix followed by a hard thresholding (Gavish & Donoho, 2014) of the singular values. Our estimation methodology is valid for asymmetric directed network also. As a by product we get an efficient procedure for edge probability matrix estimation as well. We validate our proposed approach with several synthetic experiments and compare with existing algorithms for edge probability matrix estimation. We also apply our proposed approach to three real datasets- understanding the strength of connection in (a) a social messaging network, (b) a network of political parties in US senate and (c) a neural network of neurons and synapses in C. elegans, a type of worm.

Thu 14 Feb, '19
-
CRiSM Seminar
MSB2.23

Philipp Hermann, Institute of Applied Statistics, Johannes Kepler University Linz, Austria

Time: 14:00-15:00

LDJump: Estimating Variable Recombination Rates from Population Genetic Data

Recombination is a process during meiosis which starts with the formation of DNA double-strand breaks and results in an exchange of genetic material between homologous chromosomes. In many species, recombination is concentrated in narrow regions known as hotspots, flanked by large zones with low recombination. As recombination plays an important role in evolution, its estimation and the identification of hotspot positions is of considerable interest. In this talk we introduce LDJump, our method to estimate local population recombination rates with relevant summary statistics as explanatory variables in a regression model. More precisely, we divide the DNA sequence into small segments and estimate the recombination rate per segment via the regression model. In order to obtain change-points in recombination we apply a frequentist segmentation method. This approach controls a type I error and provides confidence bands for the estimator. Overall LDJump identifies hotspots at high accuracy under different levels of genetic diversity as well as demography and is computationally fast even for genomic regions spanning many megabases. We will present a practical application of LDJump on a region of the human chromosome 21 and compare our estimated population recombination rates with experimentally measured recombination events.

(joint work with Andreas Futschik, Irene Tiemann-Boege, and Angelika Heissl)

Professor Dr. Ingo Scholtes, Data Analytics Group, University of Zürich

Time: 15:00-16:00

Optimal Higher-Order Network Analytics for Time Series Data

Network-based data analysis techniques such as graph mining, social network analysis, link prediction and clustering are an important foundation for data science applications in computer science, computational social science, economics and bioinformatics. They help us to detect patterns in large corpora of data that capture relations between genes, brain regions, species, humans, documents, or financial institutions. While this potential of the network perspective is undisputed, advances in data sensing and collection increasingly provide us with high-dimensional, temporal, and noisy data on real systems. The complex characteristics of such data sources pose fundamental challenges for network analytics. They question the validity of network abstractions of complex systems and pose a threat for interdisciplinary applications of data analytics and machine learning.

To address these challenges, I introduce a graphical modelling framework that accounts for the complex characteristics of real-world data on complex systems. I demonstrate this approach in time series data on technical, biological, and social systems. Current methods to analyze the topology of such systems discard information on the timing and ordering of interactions, which however determines which elements of a system can influence each other via paths. To solve this issue, I introduce a modelling framework that (i) generalises standard network representations towards multi-order graphical models for causal paths, and (ii) uses statistical learning to achieve an optimal balance between explanatory power and model complexity. The framework advances the theoretical foundation of data science and sheds light on the important question when network representations of time series data are justified. It is the basis for a new generation of data analytics and machine learning techniques that account both for temporal and topological characteristics in real-world data.

Thu 28 Feb, '19
-
CRiSM Seminar
MSB2.23

Prof. Isham Valerie, Statistical Science, University College London, UK (15:00-16:00)

Stochastic Epidemic Models: Approximations, structured populations and networks

Abstract: Epidemic models are developed as a means of gaining understanding about the dynamics of the spread of infection (human and animal pathogens, computer viruses etc.) and of rumours and other information. This understanding can then inform control measures to limit, or in some cases enhance, spread. Towards this goal, I will start from some simple stochastic transmission models, and describe some Gaussian approximations and their use for inference, illustrating this with data from a norovirus outbreak as well as from simulations. I will then discuss ways of incorporating population structure via metapopulations and networks, and the effects of network structure on epidemic spread. Finally I will briefly consider the extension to explicitly spatial mobile networks, as for example when computer viruses spread via short-range wireless or bluetooth connections.

Thu 14 Mar, '19
-
CRiSM Seminar
A1.01

Speaker: Spencer Wheatley, ETH Zurich, Switzerland

Title: The "endo-exo" problem in financial market price fluctuations, & the ARMA point process

The "endo-exo" problem -- i.e., decomposing system activity into exogenous and endogenous parts -- lies at the heart of statistical identification in many fields of science. E.g., consider the problem of determining if an earthquake is a mainshock or aftershock, or if a surge in the popularity of a youtube video is because it is "going viral", or simply due to high activity across the platform. Solution of this problem is often plagued by spurious inference (namely false strong interaction) due to neglect of trends, shocks and shifts in the data. The predominant point process model for endo-exo analysis in the field of quantitative finance is the Hawkes process. A comparison of this field with the relatively mature fields of econometrics and time series identifies the need to more rigorously control for trends and shocks. Doing so allows us to test the hypothesis that the market is "critical" -- analogous to a unit root test commonly done in economic time series -- and challenge earlier results. Continuing "lessons learned" from the time series field, it is argued that the Hawkes point process is analogous to integer valued AR time series. Following this analogy, we introduce the ARMA point process, which flexibly combines exo background activity (Poisson), shot-noise bursty dynamics, and self-exciting (Hawkes) endogenous activity. We illustrate a connection to ARMA time series models, as well as derive an MCEM (Monte Carlo Expectation Maximization) algorithm to enable MLE of this process, and assess consistency by simulation study. Remaining challenges in estimation and model selection as well as possible solutions are discussed.

 

[1] Wheatley, S., Wehrli, A., and Sornette, D. "The endo-exo problem in high frequency financial price fluctuations and rejecting criticality". To appear in Quantitative Finance (2018). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3239443

[2] Wheatley, S., Schatz, M., and Sornette, D. "The ARMA Point Process and its Estimation." arXiv preprint arXiv:1806.09948 (2018).

Placeholder