# Illustrative research internship projects

##### Dr Richard Everitt

###### Approximate Bayesian computation for individual based models

###### Estimating bowler ability in cricket

###### Active subspaces using sequential Monte Carlo

##### Dr Jere Koskela

###### Simulation of jump diffusions in genetics

The evolution of allele frequencies in randomly reproducing populations subject to occasional large scale events, such as population bottlenecks or selective sweeps, are typically modelled by so-called Lambda-Fleming-Viot jump diffusions. An important step in using a model for practical inference is to be able to simulate from it. Trajectories of jump diffusions can typically not be simulated exactly. A range of approximation schemes are available, but they are typically only applicable for jump diffusions taking values on the whole real line. The Lambda-Fleming-Viot describes the frequency of an allele in a population, and hence is constrained to take values in the unit interval [0,1]. This project aims to identify a range of jump diffusion approximation schemes, implement them, and compare their performance on the Lambda-Fleming-Viot model empirically.

###### Time series inference in genetics

DNA sequence data is now available from multiple generations, and there are standard models (most notably, the Wright-Fisher model) which describe the changes in allele frequencies in a population across time. However, the advent of multi-generation data is recent enough that many practical questions remain unanswered. This project is about finding an optimal balance between how many generations to sequence, and how many individuals to sequence in each generation, when the goal is to minimise the mean square error of estimators of standard genetic quantities of interest, such as the rate with which mutations arise in the population.

##### Prof. Ioannis Kosmidis

###### High-dimensional logistic regression

*p*is fixed relative to the number of observations

*n*.

*p/n → κ ∈ (0, 1)*. An increasing amount of research is now focusing on developing methodology that can recover the performance of estimation and inference from logistic regression. This project will initially compare, through simulation experiments, some recent proposals for improved estimation and inference in logistic regression under assumptions that match the expectations that modern practice sets. Then, we will attempt to derive new computationally-attractive estimators and test statistics that work well in cases like

*p/n → κ ∈ (0, 1)*.

###### Item-response theory models and politics: How liberal are the members of the US House?

##### Dr Simon Spencer

###### Simulations for the whole family (of quasi-stationary distributions)

If a Markov process has an absorbing state (reached with probability one in finite time) then the stationary distribution is boring – all the mass falls on the absorbing state. However, if we condition on the process not having reached the absorbing state yet then a so-called quasi-stationary distribution may exist. In fact, there can be infinitely many such quasi-stationary distributions for the same process. The birth-death process is a relatively simple model that has an infinite family of quasi-stationary distributions. One is straightforward: a so-called “low energy” distribution with finite mean, and all others are more exotic, “high energy” distributions with infinite mean. In this project we will look to find ways of simulating from the quasi-stationary distributions of the birth death process, and from the “high energy” distributions in particular. Then, we will look to apply these simulation techniques to more complex models in which the family of quasi-stationary distributions is currently unknown. This project will involve programming in the statistical programming language R.

Key reference: Adam Griffin, Paul A. Jenkins, Gareth O. Roberts and Simon E.F. Spencer (2017). Simulation from quasi-stationary distributions on reducible state spaces. *Advances in Applied Probability*, **49 **(3), 960-980.