# Abstracts

**Talks**

- Andreas Artemiou (Cardiff) - "Sufficient Dimension Reduction in Regression"
**Abstract:**In this talk I will give an overview of Sufficient Dimension Reduction (SDR) and how it can be used in regression and classification problems. I will start with a historical overview, to emphasize the fact that Principal Component Analysis (PCA) - an unsupervised dimension reduction method - may fail in regression. SDR as a class of supervised dimension reduction methods is being developed the last 25 years and I will present some of the key developments. Lastly I will discuss about the development of a new class of SDR methodology which incorporates the use of Support Vector Machines (SVM) and it variants for more accurate and robust estimation of the reduced subspace in these methodology. I will try to present key ideas with main references and stay away from complicated formulas.

- David Colquhoun (UCL) - "The misinterpretation of P values and the reproducibility of science: why haven't statisticians told us?"
**Abstract:**There's nothing wrong with P values. They do what it says on the tin. The problem lies in the fact that what they do is not what experimenters want.What they want to know is the probability that, if they claim an effect is real, they'll be wrong. Many experimenters believe that this is what the P value tells you, but of course it isn't. It is easy to show that, if you observe P = 0.047 in a single test of significance, and claim on that basis that the effect is real, you'll be wrong at least 26% of the time (and a great deal more often if the hypothesis is implausible) - eg see http://rsos.royalsocietypublishing.org/content/1/3/140216 - This alone is sufficient to account for much of the reproducibility crisis that has engulfed some areas of science, e.g. experimental psychology. Although the argument behind this conclusion is Bayesian (which is probably why I took so long to notice it)m I believe that it is free of any subjective elements. By failing to emphasize this in elementary courses, and by being complicit in allowing the tyranny of P = 0.05 in papers, I fear that statisticians may have contributed to the irreproducibility crisis that it is their job to prevent.

- Andy Golightly (Newcastle) - "Building bridges: Improved bridge constructs for stochastic differential equations"
**Abstract:**We consider the task of generating discrete-time realisations of a nonlinear multivariate diffusion process satisfying an Ito stochastic differential equation conditional on an observation taken at a fixed future time-point. Such realisations are typically termed

diffusion bridges. Since, in general, no closed form expression exists for the transition densities of the process of interest, a widely adopted solution works with the Euler-Maruyama approximation, by replacing the intractable transition densities with Gaussian approximations.

However, the density of the conditioned discrete-time process remains intractable, necessitating the use of computationally intensive methods such as Markov chain Monte Carlo. Designing an efficient proposal mechanism which can be applied to a noisy and partially observed system that exhibits nonlinear dynamics is a particularly challenging problem, and is the focus of this talk. By partitioning the process into two parts, one that accounts for nonlinear dynamics in a deterministic way, and another as a residual stochastic process, we develop a class of novel constructs that bridge the residual process via a linear approximation. As well as compare the performance of each new construct with a number of existing approaches, we illustrate the methodology in a real data application.

- Deirdre Hollingsworth (Warwick) - "Providing useful insights when working on a neglected tropical disease"
**Abstract**: Neglected tropical diseases (NTDs) are a group of infections which predominantly affect the “bottom billion”, or the poorest people in the world. They are responsible for chronic suffering as well as mortality in these hard to reach populations. In recent years there has been a drive to reduce the burden of these diseases through an international effort to roll out interventions at a global scale. There is a need for epidemiological modelling to assess the required duration and coverage of these interventions, as well as to evaluate whether additional interventions will be required to reach the 2020 goals, and perhaps even permanently interrupt transmission.NTDs are neglected not only in terms of their public health burden, but also in our limited understanding of their biology and life cycles, with remarkably few epidemiological studies from which to parameterise or, in some cases, even hypothesise structures for transmission models. However, this need not limit the public health policy aspirations for control - Guinea worm is close to global eradication despite remarkably limited knowledge about its biology within the host. Many other NTDs have had their prevalence and incidence reduced substantially through application of relatively straightforward interventions, but elimination may prove more challenging. Mathematical models are being used to inform policy and guide the development of new strategies to reduce transmission. Even simple mathematical models can capture much of the qualitative behaviour of these systems, but developing, validating and testing models which can be used to give detailed policy guidance is more challenging.

- Chris Jewell (Lancaster) - "Forecasting for outbreaks of vector-borne diseases: a data assimilation approach"
**Abstract:**In August 2012, the first case of a novel strain of /Theileria orientalis/ (Ikeda) was discovered in a dairy herd near Auckland, New Zealand. The strain was unusually pathogenic, causing haemolytic anaemia in up to 35% of animals within an infected herd. In the ensuing months, more cases were discovered in a pattern that suggested wave-like spread down New Zealand’s North Island. Theileria orientalis is a blood-borne parasite of cattle, which is transmitted by the tick vector /Haemaphysalis longicornis/. This tick was known to exist in New Zealand, but although its behaviour and life cycle were known from laboratory experiments surprisingly little was known about its country-wide distribution. Predicting the spread of /T. orientalis/ (Ikeda) for management and economic purposes was therefore complicated by not knowing which areas of the country would be conducive to transmission, if an infected cow happened to be imported via transportation. The approach to prediction presented here uses a Bayesian probability model of dynamical disease spread, in combination with a separable discrete-space, continuous-time spatial model of tick abundance. This joint model allows inference on tick abundance by combining information from independent disease screening, expert opinion, and the occurrence of theileriosis cases. A fast GPU-based implementation was used to provide timely predictions for the outbreak, with the predictive distribution used to provide evidence for policy decisions.

- Ruth King (Edinburgh) - "Incorporating memory into capture-recapture models using a first-order hidden Markov model(!)"
**Abstract:**In this talk we focus on incorporating memory into ecological models. In particular we consider capture-recapture studies, where observers going into the field at a series of capture events. At the initial capture event all observed individuals are uniquely marked, recorded and released back into the population. At each subsequent capture event previously unmarked individuals are marked and all observed individuals are recorded before being released. This leads to data of the form of the capture history of each individual observed, recording whether or not they are observed at each capture event.We will initially describe how standard capture-recapture models can be expressed as a hidden Markov-type model. We describe the advantages of specifying the models in this framework, including efficient model-fitting techniques and incorporating additional processes. In particular, we focus on open multi-state capture-recapture data, where individuals are recorded in a given discrete time-varying “state” when then are observed. For example, state may refer to “breeding/not breeding” or “hungry/not hungry”. For mathematical convenience it is often assumed that transitions between states can be modelled as first-order Markovian (and hence “memoryless”). However, this is often biologically unrealistic. We will consider the incorporation of memory in a parsimonious manner via the specification of a semi-Markovian transition model and describe how the models can be efficiently fitted using a first-order Markov approximation. We apply the approach to house finch data where state corresponds to “infected” or “not infected” with conjunctivitis.

- Theo Kypriaos (Nottingham) - "Recent Developments in Bayesian Non-Parametric Inference for Epidemic Models"
**Abstract**: Despite the enormous attention given to the development of methods for efficient parameter estimation, there has been relatively little activity in the area of non- parametric inference. That is, drawing inference for the quantities which govern transmission, i) the force of infection and ii) the period during which an individual remains infectious, without making certain modelling assumptions about its (parametric) functional form or that it belongs to a certain family of parametric distributions. In this talk we will describe three approaches which allow Bayesian non-parametric inference for the force of infection; namely via Gaussian Processes, Step Functions, and B-splines. We will also illustrate the proposed methodology via both simulated and real datasets.

- Tom Nichols (Warwick) - Introductory Talk - "Meta-Analysis: Review and new developments in neuroimaging"
**Abstract:**Meta-analysis is the combination of independent statistical results, with the goal of obtaining greater sensitivity and understanding whether some positive findings in the literature are evidence of a true effect or just idiosyncratic. I will review the standard tools of meta-analysis, and show how they have been adopted (or not) into my area, brain imaging. Brain imaging presents several special challenges, in particular because the full statistical results are usually not shared, but instead only a sparse summary is reported. I will describe work from my group using spatial Bayesian point process methods to conduct neuroimaging meta-analyses on these type of summary data.

- Tom Nichols (Warwick) - Research Talk - "High- and low-tech solutions for multiple testing in large scale inference"
**Abstract**: Modern scientific methods typically rely on massive data where thousands to millions of variables are measured (e.g. gene expression, single nucleotide polymorphisms, brain images, etc), and often on just 10's of subjects or units. This massive multiplicity must be accounted for in the inference procedures, and with such small samples asymptotic methods cannot be necessarily depended on. I will review the multiple testing problem in general (for any type of data) and the approaches that are used to address it. I will then describe my own work on multiple testing for brain imaging, showing how both 'high-tech' methods using the geometry of random fields as well as 'low-tech' resampling-based methods are needed to make inferences on brain images while controlling for the multiple testing problem.

- Mary Oldham (Gregynog) - "The Gregynog Library"
**Abstract:**Mary Oldham is the librarian at Gregynog, and will give us some of the history of the printing press at Gregynog...

- Simon Spencer (Warwick) - "Bayesian inference and model selection for stochastic epidemics (with special attention to Escherichia coli O157:H7 in cattle)"
**Abstract:**Model fitting for epidemics is challenging because not all of the information needed to write down the likelihood function is observable, for example the times of infection and recovery are not usually observed. Furthermore, the data that are available from diagnostic tests may not be perfectly accurate. These considerations are typically overcome by applying computationally intensive data augmentation techniques such as Markov chain Monte Carlo. To make things even more difficult, most of the interesting epidemiological questions are best expressed as model selection problems and so fitting just one model is not sufficient to answer them. Instead we must fit a range of different models, each representing an important epidemiological hypothesis, and then make meaningful comparisons between them. I will describe how to overcome (most of) these difficulties to learn about the epidemiology of Escherichia coli O157:H7 in cattle. Joint work with Panayiota Touloupou, Bärbel Finkenstädt Rand, Pete Neal and TJ McKinley.

## Posters

- Muteb Alharthi (Nottingham) - "Bayesian model choice for epidemic models with missing data"
- Francois-Xavier Briol (Warwick) - "Probabilistic Integration with Theoretical Guarantees"
**Abstract:**The field of probabilistic numerics focuses on the study of numerical problems from the point of view of statistical inference, often from a Bayesian perspective. In the specific case of integration, Bayesian Quadrature (BQ) provides estimators for the value of integrals together with a measure of our uncertainty over the result, which takes the form of a posterior variance. These estimators have been shown empirically to converge quickly to solution of the integral, however, no explicit rates of convergence were known until very recently. This poster will present a recent paper which provides the very first rates of convergence and posterior contraction of BQ. Those are obtained by combining BQ with a convex optimization algorithm called the Frank-Wolfe algorithm. This allows for a very efficient quadrature method which can have up to exponential convergence in the number of samples, and hence compares favourably to most Monte Carlo methods. Our approach is applied to successfully quantify numerical error in the solution to a challenging Bayesian model choice problem in cellular biology.

- Jake Carson (Warwick) - "Unbiased Solutions of PDE Models via the Feynman-Kac Formulae"
- Cyril Chimisov (Warwick) - "Adaptive Gibbs Sampling"
- Jon Cockayne (Warwick) - "Probabilistic Numerical Methods for the Solution of Partial Differential Equations"
**Abstract:**Recent work establishes probabilistic foundations for models of the numerical error arising in the numerical solution by finite element approximation of ordinary and partial differential equations (PDEs). Such methods are of particular interest for PDEs since explicit solutions are rarely available, and obtaining numerical estimates at arbitrary precision is often computationally infeasible. Thus, a rigorous quantification of uncertainty in the approximate solution is important. We seek to develop methods for obtaining probabilistic measures of uncertainty for linear and nonlinear PDEs when solved numerically.

- Ruth Harbord (Warwick) - "Inferring Brain Connectivity with the Multiregression Dynamic Model"
- Jere Koskela (Warwick) - "Efficient sequential Monte Carlo sampling of rare trajectories in reverse time"
**Abstract:**Rare event simulation seeks estimate probabilities of unlikely but significant events, such as extreme weather, market crashes, or failure rates in communications networks. In complex models the probabilities of such events are often intractable, and naive simulation fails because of the low probability of the event of interest. Sequential Monte Carlo provides a practical method for sampling rare events by biasing probability mass toward the event of interest, though as always the design of good proposal distributions is difficult but crucial. The typical approach for sampling rare trajectories of a stochastic process is an exponential twisting of the forward dynamics, motivated by approximating a large deviation principle. I present an alternative, based on the observation that a forwards-in-time trajectory conditioned to end in a rare state coincides with an unconditioned reverse-time trajectory started from the rare state. This observation has led to very efficient simulation methods in coalescent-based population genetics. I will introduce reverse-time SMC as a generic algorithm, discuss settings in which it is advantageous, and present some novel applications both for coalescents and other stochastic processes.

- Boryana Lopez Kolkovska (Warwick) - "Survival Analysis Models on MESS epilepsy data"
**Abstract:**In the Multicentre study of early Epilepsy and Single Seizures (MESS) study, patients diagnosed with epilepsy were subjected to a randomized controlled trial policies of immediate versus deferred treatment. We are interested in providing a prognosis for the patients and clinicians from the study, and propose to apply Survival Analysis methods. A first approach proposed by J. Rogers is presented, where the times to a first seizure are modeled by a negative binomial mixture model, with a gamma distributed random effect. The model considers the existence of a proportion of patients who attain remission post-randomization, and terms such proportion of patients as a cure fraction of the population. For this model each patient is considered to have an individual seizure rate, which is assumed to change when randomized to an anti-epileptic drug. The underlying seizure rate, post-randomization rate change for each patient and the population's heterogeneity coefficient are estimated from Rogers model. For this type of live recurrent events, Cox proportional hazards models are commonly used. We perform a residual analysis on Cox and Roger’s models, in order to compare and study their goodness of fit. From such results and the inherent nature of patients diagnosed with epilepsy, we present a truncated negative binomial mixture model.

- Audrey Kueh (Warwick) - "Modelling Penumbra in Computed Tomography"
**Abstract:**The spot geometry in Computed Tomography (CT) may fluctuate from scan to scan. A model is thus proposed to measure the spot geometry by analysing the image itself.

- Matt Moores (Warwick) - "Lorentzian mixture model for Raman spectroscopy"
- Murray Pollock (Warwick) - "Exact Simulation in a Nutshell"
- Ewart Shaw (Warwick) - "Statistical Inference, Orthogonal Polynomials and Electrostatics"
- Simone Tiberi (Warwick) - "Bayesian hierarchical stochastic analysis of multiple single cell Nrf2 protein levels"
**Abstract:**We will present a Bayesian hierarchical analysis of multiple single cell fluorescent Nrf2 reporter levels in nucleus and cytoplasm.

Nrf2 is a transcription factor regulating the expression of several defensive genes protecting against various cellular stresses.

We propose a reaction network based on five reactions, including a distributed delay and a Michaelis-Menten non-linear term, for the amount of Nrf2 protein moving between nucleus and cytoplasm. The diffusion approximation is used to approximate the original Markov jump process. To explain the between-cell variability for multiple single cell data, we embed the model in a Bayesian hierarchical framework. Furthermore, we introduce a measurement equation, which involves a proportionality constant and a bivariate error, for the nuclear and cytoplasmic measurements, in order to relate the unobservable stochastic population process to the observed data.

Bayesian inference is performed via a data augmentation procedure by alternatively sampling from the conditional distributions of the model parameters and the latent process. We show inferential results obtained on simulation studies and on experimental data from single cells under the basal condition and under the induction by a stimulant, sulforaphane.

- Panayiota Touloupou (Warwick) - "Scalable inference for Markovian and non-Markovian Epidemic Models"