# Abstracts

**Talks**

- Maggie Chen (Swansea) - “Flash Crashes, Jumps and Running Jumps: A New Method of Jump Detection”
**Abstract:**Jumps are often observed and widely discussed in context of modern complex financial system, but yet without a universal definition. Further, the classic ARCH/GARCH or Poisson jump models are limited in interpreting important financial effects such as contagion and clustering. We propose a new jump detection method to identify both inter-day and intraday jumps as separate components to the underlying volatility process. Not only do we apply such methods to a long history of intra-day two-minute level dataset of S&P 500, we also examine closely the Mini flash crash that occurred on May 06, 2010, in order to detect jumps. We found multiple jumps on the day and argue that the most commonly used bi-power method by Anderson et al. (2010), which found no jumps on that day, has fundamental flaws in capturing jumps. To enhance the robustness, we utilise a median approach to reduce the masking effects and introduce the concept of running jumps in order to capture significant jumps and jump runs when there are significant market events triggering sharp volatility variations.

- Idris Eckley (Lancaster) and Tim Park (Lancaster / Shell UK) - "Locally stationary time series methods: making sense of sensor data"
**Abstract:**The use of sensors for data collection is now ubiquitous in modern industrial systems and consumer devices. Such sensors can unobtrusively record time series, potentially at very high rates. As such they can be a rich resource, but are not without their statistical challenges! For example, problems can arise due to the volume of data, data structure or the environment in which the data is collected. In this series of talks we will introduce some of these challenges and describe how recent time series approaches are being developed to address such data realities. In the first talk we will motivate and introduce the locally stationary time series paradigm, focussing in particular on the seminal work of Nason, Von Sachs and Kroisandt (J. Royal Stat. Soc B, 2000) who introduced a wavelet-based modelling framework for such time series. We will then explain how such frameworks can be extended to a multivariate setting and introduce the concepts of local coherence and partial coherence. We also show how these methods can be used to derive insight in various applied settings. The third (tutorial) session will be a problem solving activity with the aim of developing ideas and strategies to solve a current sensor-based problem encountered by Shell. In the final session we will discuss some recent research which focusses on key questions related to the sampling of (discrete) time series.

- Paul Jenkins (Warwick) - “Statistical and computational challenges from genomic data”
**Abstract:**Advances in DNA sequencing technologies are providing a wealth of data on genetic variation, but making sense of this information raises many statistical and computational challenges. In principle we could write down an evolutionary model and compute a likelihood for the data under this model, allowing us to perform statistical inference on numerous biological and demographic processes: mutation, natural selection, migrations, population structure, and so on. In practice such likelihoods are intractable for all but the simplest models, and we must resort to computationally intensive Monte Carlo approaches, summary statistics, heuristic model simplifications, or a combination of these. In this talk I will describe a new analytic method for the purposes of inference about the process of recombination. Recombination is a fundamental aspect of reproduction which causes the shuffling of genetic variants, or alleles, along a chromosome so that the genetic makeup of an offspring differs from that of its parent. It is therefore important to quantify recombination in for example locating genes associated with complex diseases. I will show how an application of the martingale central limit theorem can be used to derive an accurate model of recombination with a key property: its likelihood is entirely tractable. The result is illustrated by embedding the likelihood in a reversible jump Markov chain Monte Carlo algorithm, and applying this to genomic data from the model fruit fly Drosophila melanogaster. We construct the first genome-wide maps of fine-scale recombination rate variation in this organism.

- Kim Kenobi (Aberystwyth) - “Characterising differences in root system architecture of low versus high nitrogen uptake efficiency wheat plants”
**Abstract:**The problem of registering and comparing the shapes of wheat roots and extracting geometric differences in root system architecture based on nitrogen uptake efficiency is considered. A novel distance measure between two-dimensional images of wheat roots is introduced. This geometric information is combined with quantitative traits obtained from a software package to identify important traits that distinguish between low and high nitrogen uptake efficiency wheat lines. By eye it is difficult to discern any differences in the root system architectures of the different types of plant, but with the aid of linear discriminant analysis it is possible to highlight substantive differences dependent on the nitrogen uptake efficiency of the wheat lines.

- Ioannis Kosmidis (UCL) - “Bias in parametric estimation: reduction and useful side-effects”
**Abstract:**In this talk we present some recent work on a unified computational and conceptual framework for reducing the bias in the estimation of statistical models from a practitioners point of view. The talk will discuss several of the shortcomings of classical estimators (like the MLE) with demonstrations based on real and artificial data, for several well-used statistical models including Binomial and categorical responses models (for both nominal and ordinal responses) and Beta regression. The main focus will be on how those shortcomings can be overcome by reducing bias. A generic algorithm of easy implementation for reducing the bias in any statistical model will also be presented along with specific purpose algorithms that take advantage of specific model structures.**Paper:**http://doi.org/10.1002/wics.1296

- Gareth Peters (UCL) - “Asymptotic Approximations and Monte Carlo Approximations for Risk and Insurance”
**Abstract:**In this presentation a tutorial type overview of some basic results in risk and insurance related to risk measure estimation will be discussed. In particular the class of spectral risk measures (Value-at-Risk, Expected Shortfall etc.) will have first order asymptotic approximations explained. These approximations will be compared to alternative approximations for the estimation of these risk measures based on the Panjer recursion and also Monte Carlo path-space Importance Sampling methods.

- Philip Protter (Columbia) - “Liquidity theory and high frequency trading”
**Abstract:**In 1998 the Securities and Exchange Commission (SEC) of the United States authorized the existence of electronic stock exchanges, and high speed trading began shortly thereafter. In the beginning, the trading was fast, in seconds, but today it is very fast, in microseconds. It has evolved to the point where speed is tantamount to profits, often referred to as liquidity profits. In this series of three talks we first review a little stochastic calculus for semimartingales, and then explain the liquidity model of Umut Çetin, Robert Jarrow, and the speaker (Finance and Stochastics, 2004). We will then verify the model's applicability via a data study, including a movie of the supply curve made by Marcel Blais. We will then explain what liquidity profits are, who used to get them and why, and who now gets them and how. This involve what is known as high frequency trading, a practice that is controversial. We will explain, via a mathematical model, a more sinister side of what is transpiring. This mathematical analysis provides, inter alia, a method for quantifying the amount of profits obtained, and at whose expense they are obtained.

## Posters

- Francois-Xavier Briol (Oxford / Warwick) - “Hawkes Processes to predict the success of movies”
**Abstract:**Hawkes process are a generalisation of the Poisson Point Process which allows for temporal dependence of events. In practise, this means that events will be clustered in time and such models are therefore used when there is some sort of “self-excitation” in the data. Some existing application areas include volcano eruptions, crime prevention and financial time series. This poster will introduce Hawkes processes from a theoretical point of view and will present their use in the modelling of the success of movies. Joint work with Dr. Elke Thonnes (Warwick).

- Patrick Conrad (Warwick) - “Probability Measures on Numerical Solutions of ODEs and PDEs for Uncertainty Quantification and Inference”
**Abstract:**Deterministic ODE and PDE solvers are widely used, but characterizing the error in numerical solutions within a coherent statistical framework is challenging. We successfully address this problem by constructing a probability measure over functions consistent with the solution that provably contracts to a Dirac measure on the unique solution at rates determined by an underlying deterministic solver. The measure straightforwardly derives from important classes of numerical solvers and is illustrated on uncertainty quantification and inverse problems.

- Mathias Cronjager (Oxford / Warwick) - “Determining the expected site frequency spectrum associated with Xi-coalsescents”
**Abstract:**We generalize results from Birkner et al. (Genetics, 2013) for computing the expected site frequency spectrum of a coalescent from the case of Lambda-coalescents to the more general class of Xi-coalescents. Lambda-coalescents allow for events where one group of arbitrary size of of ancestral lineages merges into one lineage; Xi-coalescents allow for events where an arbitrary number of groups of ancestral lineages of arbitrary size merge into single lineages. The derived formulas for the expected marginals of the SFS are potentially interesting from a theoretical perspective, but in the general case infeasible to evaluate for large sample sizes, as they involve evaluating a very large set of recursions. We give bounds of the complexity of solving these recursions both in the general case and in a specific case of relevance to modelling highly fecund diploid populations (where bounds not much worse than for Lambda-coalescents can be established).This work constitutes the masters thesis of the author in mathematics at the Technical University of Berlin under the supervision of Prof. Dr. Jochen Blath.

- Christiane Görgen (Warwick) - “Interpolating Polynomials for Staged Trees and Chain Event Graph”
**Abstract:**I am interested in the formal way in which methods from algebra and algebraic geometry can be used to analyse different non standard model structures in statistics. In particular, my research focuses on the Chain Event Graph (Smith&Anderson 2008) which has been shown to be a powerful statistical tool. While results from graph theory have been applied in order to characterise the implicit conditional independence structure and equivalence classes of these models (Thwaites&Smith 2015), an alternative algebraic approach provides us with even more promising findings. I will outline how a polynomial associated to the model graph enables us to represent known properties more elegantly and to reach a deeper understanding of the model, also with respect to a causal interpretation.

- Felipe Medina Aguayo (Warwick) - “Is Pseudo-Marginal Always the Best?”
**Abstract:**The Metropolis-Hastings (MH) algorithm is a useful and powerful simulation tool for solving problems in many areas. However when we do not have an analytic expression for the target density, we may need to appeal to an algorithm on an extended space. The use of unbiased estimators for the intractable target within a MH algorithm provides a different alternative. Even though this noisy chain introduces a bias, a simple modification gives rise to the pseudo-marginal (PM) algorithm, where such error disappears. Unfortunately, the PM may have an undesirable property that leads to poor mixing and slow convergence towards the target. Therefore, settings where a noisy algorithm may be preferred are plausible, as long as we are comfortable with the inexactness of the draws.

- Helen Ogden (Warwick) - “Approximating the normalizing constant in sparse graphical models”
**Abstract:**There are many situations in which we want to compute the normalizing constant associated with an unnormalized distribution. If we know that some of the variables are conditionally independent of one another, we may write this distribution as a graphical model, and exploit the structure of the model to reduce the cost of computing the normalizing constant. However, in some situations even these efficient exact methods remain too costly. We introduce a new method for approximating the normalizing constant, controlled by a "threshold" parameter which may be varied to balance the accuracy of the approximation with the cost of computing it. We demonstrate the method in the case of an Ising model, and see that the error in the approximation shrinks quickly as we increase the threshold.

- Murray Pollock (Warwick) - “Algorithmic Design for Big Data: The ScaLE Algorithm”
**Abstract:**This poster will introduce a new methodology for exploring posterior distributions by modifying methodology for exactly (without error) simulating diffusion sample paths. This new method has remarkably good scalability properties as the size of the data set increases (it has sub-linear cost, and potentially no cost), and therefore is a natural candidate for “Big Data” inference. Joint work with Paul Fearnhead, Adam Johansen and Gareth Roberts.

- Nick Tawn (Warwick) - “Improving the Efficiency of the Parallel Tempering Algorithm”
**Abstract:**Bayesian inference typically requires MCMC methods to evaluate samples from the posterior, however it is important that the MCMC procedure employed samples ‘correctly’ from the distribution for the sample estimates to be valid. For instance, if the posterior distribution was multi-modal then by running an MCMC procedure for only a finite number of runs it is possible that the chain can become trapped and not explore the entire state space. Well known algorithms to aid mixing in multimodal settings are the Parallel and Simulated tempering algorithms. The poster will introduce these and demonstrate their powers and weaknesses for sampling in such situations, and then describe the way in which these algorithms can be setup to achieve optimal efficiency when sampling. The key feature of these algorithms is the ability to share information from the mixing in the hotter states to aid the mixing of the chain in the hotter states. This poster also presents a new approach based on reparameterisation that could potentially enhance the algorithms’ efficiency. Empirical evidence is illustrated to show that this new algorithm appears to vastly enhance the trade of mixing information between temperature levels when targeting certain posterior distributions.