# OxWaSP Mini-Symposia 2018-19

#### Term 2

#### Friday January 25th

14:00pm - 15:30pm Petros Dellaportas (UCL)

**Identifying and predicting jumps in financial time series (MB 0.07)**

Abstract :We deal with the problem of identifying jumps in multiple financial time series using the stochastic volatility model combined with a jump process. We develop efficient MCMC algorithms to perform Bayesian inference for the parameters and the latent states of the proposed models. In the univariate case we use an homogeneous compound Poisson process for the modelling of the jump component. In the multivariate case we adopt an inhomogeneous Poisson process, with intensity which is also a stochastic process varying across time and economic sectors and markets. A Gaussian process is used as prior distribution for the intensity of the Poisson process. This model is known as doubly stochastic Poisson process or Gaussian Cox process. The efficiency of the proposed algorithms is compared with existing MCMC algorithms. Our methodology is tested through simulation based experiments and applied on 600 stock daily returns of Euro STOXX index over a period of 10 years.

#### Term 1

#### Friday November 30th

14:00pm -15:00pm Mingli Chen (Warwick)

**Modelling Networks via Sparse Beta Model (in room MB0.07)**

We propose the Sparse Beta Model, a novel network model that interpolates the celebrated Erdos-Renyi model and the Beta Model and show that the Sparse Beta Model is a tractable model for modelling sparseness of a network. We apply the proposed model and estimation procedure to the well-known microfinance data in Banerjee et al. (Science, 2013) and find interesting results.

15:30pm - 16:30pm Mihai Cucuringu (Oxford University)

**Spectral methods for certain inverse problems on graphs (in room MB0.07)**

We study problems that share an important common feature: they can all be solved by exploiting the spectrum of their corresponding graph Laplacian. We consider the classic problem of establishing a statistical ranking of a set of items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years. Its least squares solution can be approximated by either a spectral or a semidefinite programming relaxation, followed by a rounding procedure. We also present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The proposed algorithm works in nearly-linear time, provides guarantees for the quality of the clusters for 2-way partitioning, and consistently outperforms existing spectral approaches both in speed and quality. Building on this work, we recently proposed an algorithm for clustering signed networks (where the edge weights between the nodes of the graph may take either positive or negative edges) that compares favourably to state-of-the-art methods. Time permitting, we discuss possible future extensions of the group synchronization framework, applications to extracting leaders and laggers in multivariate time series data, and the phase unwrapping problem.

#### Friday November 16th

14:00 Chris Wymant (Big Data Institute, University of Oxford)

**Analysis of pathogen genetic sequence data to help prevent the spread of infectious diseases (in room MB0.07)**

Infectious diseases kill millions of people every year. Epidemiological studies of these diseases try to identify patterns that are associated with disease spread, so that we can more effectively intervene and improve public health. Molecular epidemiology uses molecular data for this aim; in particular, the genetic sequence of the pathogen from infected individuals. Sequences accumulate mutations over time, and so such data allow us to make inferences about the pathogen's evolutionary history and perhaps about factors affecting it, i.e. the story of the epidemic from the pathogen's point of view. After a general introduction to this field of work I will explain our molecular epidemiological method 'phyloscanner', some applications of it to large HIV datasets, what we learned, the statistical models involved, and ways in which we would like these models to be better.

15:30 Julia Brettschneider (Department of Statistics, University of Warwick)

**Spatial statistics in scientific research involving image data (in room MB0.07)**

Progress in imaging technologies has opened up new avenues for scientific research. Statistical methodology needs to be adapted and extended to optimally exploit the available data and address questions formulated by scientists. Interdisciplinary dialog about new types of data can also lead to better models of the measurement process, to improved preprocessing and quality assessment of the data and to novel methods of knowledge extraction and models of the measurement process. In X-ray CT, for example, planar point processes are a natural model for dead pixels. Concepts such as complete spatial randomness can for example be explored with functions capturing between point interactions. They can be used to make statements and inference about the state of the detectors. In fluorescent confocal microscopy, a central interest is the imaging of protein concentration. Distances between point clouds can be captured, for example, using the earth movers distance. In cell biology, this can be used to model relative abundance of two protein species or to describe and analysis of the temporal evolution of a single protein.

#### Friday November 2nd

1400-1500 Sarah Penington (Bath University)

**The spreading speed of solutions of the non-local Fisher-KPP equation (in room MB0.07)**

The non-local Fisher-KPP equation is a partial differential equation (PDE) which is used to model non-local interaction and competition in a population, and can be seen as a generalisation of the classical Fisher-KPP equation. In the 80s, Bramson used a Feynman-Kac formula to prove fine asymptotics for the spreading speed of solutions of the Fisher-KPP equation using probabilistic techniques. Bramson's proofs also rely on a maximum principle which does not hold for the non-local form of the PDE. However, it turns out that we can adapt his techniques to prove results on the non-local Fisher-KPP equation using only probabilistic arguments - in particular, probability estimates for Brownian motion.

1530-1630 Sigurd Assing (University of Warwick)

**Enlargement of Filtrations (in room MB0.07)**

In a recent paper I was able to apply Enlargement of Filtrations to turn a parabolic Stochastic Partial Differential Equation (SPDE) into an elliptic one, and this was a little bit surprising as this technique usually does not belong to the tool-box used to treat SPDEs. So, I started to think that Enlargement of Filtrations might be of interest to people working in all sorts of fields, and decided to talk about it to a Stats-Community. This talk won't be a lecture on Enlargement of Filtrations. I'll rather discuss examples and try to make connections to related fields. I'll also touch questions like "What is the essence of Ito's formula?" and "Why is Malliavin calculus useful?".

#### Friday October 19th

14:00 Martyn Plummer (Warwick University)

**Bayesian Analysis of Generalized Linear Mixed Models with MCMC (in room MB0.07)**

BUGS is a language for describing hierarchical Bayesian models which syntactically resembles R. BUGS allows large complex models to be built from smaller components. JAGS is a BUGS interpreter which enables Bayesian inference using Markov Chain Monte Carlo (MCMC).

The efficiency of MCMC depends heavily on the sampling methods used.

JAGS is "black box" software that makes decisions about sampling methods without user input. Therefore a key function of the JAGS interpreter is to identify design motifs in a large complex Bayesian model that have well-characterized MCMC solutions and apply the appropriate sampling methods.

Generalized linear models (GLMs) form a recurring design motif in many hierarchical Bayesian models. Several data augmentation schemes have been proposed that reduce a GLM to a linear model and allow joint sampling of the coefficients. I will review the schemes that have been implemented in JAGS and highlight the important relationship between graphical models and sparse matrix algebra.

15:30 Ricardo Silva (University College London)

**Neural Networks and Graphical Models for Constructing and Fitting Cumulative Distribution Functions (in room MB0.07)**

There are several ways of building a likelihood function. Mixture models, hierarchies, re-normalized energy functions and copulas are examples of popular approaches. In this talk, we will explore how multivariate constructions and parameter fitting can be accomplished by parameterizing cumulative distribution functions and handling the computational implications. In our first method, we describe how deep neural networks can encode a very general family of CDFs, and how the standard tools of automatic differentiation can be easily repurposed for maximum likelihood under this parameterization. In our second discussion, we will show factorized representations of CDFs and how the machinery of graphical models immediately transfers into this domain by reinterpreting message-passing.