# Conference programme

#### Wednesday 21 July

13:50 - 14:00 Welcome

14:00 - 15:00 Plenary talk: Michelle Kendall (Warwick): Inferring 'good' trees from genetic and epidemiological data

15:00 - 15:40 Ian Letter (Oxford): On hybrid zones and the effect of barriers

15:40 - 16:20 Coffee break

16:20 - 17:00 Jaromir Sant (Warwick): Inference of natural selection from allele frequency time series data using exact simulation techniques

**17:30 - 19:30 Poster reception (Scarman)**

#### Thursday 22 July

10:10 - 10:50 Suzie Brown (Warwick): Kingman limit for non-neutral populations, with applications to sequential Monte Carlo

10:50 - 11:30 Coffee break

11:30 - 12:30 Plenary talk: Leo Speidel (UCL): Studying the deep history of humans using inferred genealogies

12:30 - 14:00 Lunch

14:00 - 15:00 Plenary talk: Richard Durbin (Cambridge): Coalescent models for modern and ancient genome sequence data

15:00 - 15:40 Carey Metheringham (Queen Mary University of London): Natural selection in response to ash dieback

15:40 - 16:20 Coffee break

16:20 - 17:00 Nancy Bird (UCL): Haplotype-based recent ancestry sharing methods reveal fine-scale genetic structure in West and Central Africa

**19:00 Conference dinner (Scarman)**

#### Friday 23 July

09:30 - 10:10 Phil Hanson (Warwick): In order to move forward, we must look back

10:10 - 10:50 Terence Ho Lung Tsui (Oxford): Uncovering genealogy of superprocesses through lookdown constructions

10:50 - 11:30 Coffee break

11:30 - 12:30 Plenary talk: Jere Koskela (Warwick): Gradient-based MCMC for coalescent trees

12:30 - 14:00 Lunch

## Abstracts

###### Michelle Kendall (Warwick): Inferring 'good' trees from genetic and epidemiological data

In phylogenetics, trees are often inferred from genetic and/or epidemiological data to describe evolutionary relationships amongst organisms. In infectious disease epidemiology, trees can be used to represent the transmission of pathogens. In these settings and others, trees are more than just diagrams for convenient visualisation: they can be powerful tools for further analysis. However, robust analyses rely on having confidence in the accuracy of the inferred tree. Methodological choices within an inference process can lead to statistical support for a variety of trees, and MCMC inference methods result in vast collections of "likely" trees. We present metric-based methods for quickly sorting through large collections of trees, helping to identify a smaller and more manageable number of distinct, credible histories supported by the data. The methods, which are available in the R package *treespace*, also help to identify the sources of uncertainty, for example where there are conflicting signals in genetic data due to the evolution being not truly tree-like. Finally, I will describe ongoing and future projects in this area to which I am keen to return as soon as covid work demands less of my time - hopefully very soon!

###### Ian Letter (Oxford): On hybrid zones and the effect of barriers

[Talk] Recent results of Etheridge et al. (2016) shows that hybrid zones of populations with selection against heterozygotes evolve, when correctly rescaled, as mean curvature flow. Gooding (2018) extends this result to include an asymmetric selection of homozygotes, where we add a constant push to the dynamic of the hybrid zone. Both of those results rests on modelling the density of a particular allele as the solution of a partial differential equation in the euclidean space , proving the result in that deterministic setting and then showing the presence of the noise caused by the genetic drift does not disrupt the conclusion. In this talk, I will sketch the main ingredients of the proof of these two results and then proceed to show how the proof can be adapted to capture other effects on the population; such as the presence of barriers. Barriers in this context refer to environmental obstacles that prevent the population from invading certain zones. In reality, this could be mountains, ocean, forests or other change of surrounding factors. Mathematically this translates into studying the population in a subset of the euclidean space with reflecting conditions on the boundary. Part of the difficulty of this is actually having a sensible noise that captures the boundary condition on the domain. As a remarkable consequence, we get that barriers can provide survival of the less fit homozygote, even if at the start the fittest homozygote dominates an unbounded region of the domain. This is in sharp contrast to the euclidean space setting, where any initial condition in which the fittest homozygote starts dominating an unbounded region of the space leads to it dominating the whole space. This is work under the supervision of Alison Etheridge.

###### Jaromir Sant (Warwick): Inference of natural selection from allele frequency time series data using exact simulation techniques

[Talk] A standard problem in population genetics is to infer evolutionary and biological parameters such as the effective population size, mutation rates, and strength of natural selection from DNA samples extracted from a contemporary population. That all samples come only from the present-day has long been known to limit statistical inference; there is potentially more information available if one also has access to ancient DNA so that inference is based on a time-series of historical changes in allele frequencies. In this talk I will introduce a Markov Chain Monte Carlo method for Bayesian inference from allele frequency time-series data based on an underlying Wright-Fisher diffusion model of evolution. The chief novelty is that we show this method to be exact in the sense that it is possible to enable mixing by augmenting the state space with the unobserved diffusion trajectory, despite the fact that the transition function of the diffusion is intractable. We develop an efficient method in which trajectory updates and accept/reject probabilities can be calculated without error, and illustrate its performance on simulated data.

This is joint work with Paul Jenkins, Jere Koskela, and Dario Spano (University of Warwick).

###### Carey Metheringham (Queen Mary University of London): Natural selection in response to ash dieback

[Talk] We investigated the response to ash dieback infection in ancient woodland in southeast England using whole genome sequencing of a multigenerational population of ash exposed to over six years of ash dieback infection.

The shift in alleles and predicted disease susceptibility between adult and juvenile trees provides a rare snapshot of natural selection occurring in the wild, and hope for future natural regeneration of ash woodland.

###### Suzie Brown (Warwick): Kingman limit for non-neutral populations, with applications to sequential Monte Carlo

[Talk] Kingman’s coalescent emerges as the limiting genealogical process for a wide class of neutral population models as the population size tends to infinity. Kingman (1982) described a subclass of these models, and Möhle & Sagitov (2001, 2003) completed this work by proving necessary and sufficient conditions for weak convergence to the Kingman coalescent. All of these results applied only to neutral population models, however it is also possible to recover Kingman limits from non-neutral models. We considered a class of interacting particle systems that are not generally neutral, and proved that, under conditions reminiscent of Möhle & Sagitov, genealogies of finite samples converge weakly to Kingman’s n-coalescent as the whole population size tends to infinity. The class of interacting particle systems studied includes those simulated by sequential Monte Carlo algorithms, so our results also give insight into the genealogies induced by these algorithms, with important implications for performance and tuning. Joint work with Adam Johansen, Paul Jenkins and Jere Koskela.

###### Leo Speidel (UCL): Studying the deep history of humans using inferred genealogies

Genealogies extrapolate relationships of individuals to the past where data is comparatively sparse.

We are now able to infer such genealogies from genetic variation data of thousands of individuals and including for ancient genomes.

In this talk, I will discuss how inferred genealogies can inform us about deeper structure and ancient migrations, by decomposing coalescence rates and using summaries of genealogies which we can compare to expectations under the coalescent and in simulations.

###### Richard Durbin (Cambridge): Coalescent models for modern and ancient genome sequence data

###### Nancy Bird (UCL): Haplotype-based recent ancestry sharing methods reveal fine-scale genetic structure in West and Central Africa

[Talk] Haplotype-based methods, which infer the proportion of haplotypes for which individuals share a most recent common ancestor, can have increased power to detect fine-scale genetic structure compared with methods that rely on independent SNPs. Here we analyse unpublished genetic data comprising >500,000 polymorphic loci typed in ~1250 individuals from ~100 ethnolinguistic groups from Cameroon, Ghana, Nigeria, and the Republic of the Congo, detecting a previously unappreciated degree of west African population sub-structure. Recent work has indicated that similar levels of structure may be relevant when correcting for population stratification in genotype-phenotype association studies. We demonstrate how this genetic structure can be examined at various time scales, both by varying the set of individuals used to match haplotypes, and by analysing different lengths of identical-by-descent (IBD) segments shared among people. We also use this haplotype-sharing approach to infer the presence of, and date, events where genetically distinct populations intermixed. By cross-referencing our inference with linguistic and archaeological records, we provide evidence for previously unreported climate-induced migrations occurring more than 3000 years ago, and interactions induced by the Kanem-Bornu (700-1400AD) and Ghana (300-1100AD) empires. Furthermore, we examine our ability to test detailed hypotheses about a population’s history using genetic data. Examples include comparing a model of population size change based on archaeological data with that inferred from genetics, as well as relating a group’s oral history to their inferred admixture history.

###### Terence Ho Lung Tsui (Oxford): Uncovering genealogy of superprocesses through lookdown constructions

[Talk + poster] In this talk / poster presentation, I will demonstrate how one can uncover genealogical structures and ancestral lineages of spatially-structured population through the means of lookdown constructions. By enriching a spatially-structured population with levels imposed on individual particles that follows carefully selected dynamics, one can introduce a spatial-level Markovian population that enables us to trace genealogy of the underlying spatial population. We will introduce in particular an individual-based population model that converges to a Fisher-KPP equation as scaling limit and use lookdown construction to derive spatial distribution of ancestral lineages.

###### Jere Koskela (Warwick): Gradient-based MCMC for coalescent trees

Posterior distributions arising out of coalescent-based models of genetic diversity are nearly always intractable. Markov chain Monte Carlo methods, typically based on the Metropolis-Hastings algorithm, are gold-standard tools for sampling from these posteriors. However, their computational cost scales notoriously badly with problem complexity, so that their practical use is restricted to data sets which are small by modern standards. There is a growing library of sophisticated MCMC methods which use the gradient of a posterior density to guide the chain during its run. These methods have better theoretical scaling properties than Metropolis-Hastings, but cannot be readily implemented for coalescent models because a posterior defined on discrete tree topologies does not have a natural analogue of a gradient. I will demonstrate how embedding spaces of coalescent trees into a continuous space can enable the use of gradient information in algorithm specification, and also provides a framework for designing adaptive MCMC algorithms in a principled way. Tests on simple examples show that these methods speed up mixing over the posterior, sometimes dramatically.

###### Phil Hanson (Warwick): In order to move forward, we must look back

[Talk + poster] The models in population genetics fall roughly into two categories. First we have backwards in time models like the coalescent which attempt to describe the structure of our ancestry backwards in time, culminating in a most recent common ancestor (MRCA) whom we have to thank for our existence. Secondly we have models that look forward in time like the Wright-Fisher diffusion and Fleming-Viot (FV) process. These processes try to capture changes in allele frequencies in a population moving forwards in time, due to effects like genetic drift, mutation and natural selection. There is a wonderful mathematical link between these two sets of processes via moment duality. This duality allows us to study one set of processes in order to say interesting things about another. In this talk I will start by presenting some results on the Kingman coalescent and Ancestral selection graph at small times, ultimately proving a diffusion limit for the CLT fluctuations of the ASG. This new knowledge of small time coalescent behaviour helps analyse small-time behaviour in the FV process with parent-independent mutation, allowing us to prove the existence of a FV CLT. This results in a Gaussian random measure-valued process which, over time, converges to a coloured noise.

###### Trevor Cousins (Cambridge): Inference of ancestral population structure from a single diploid genome sequence

[Poster] Many existing methods for demographic inference, such as the pairwise sequentially Markovian coalescent (PSMC), assume that ancestral populations experience panmixia. If a population is structured, these methods will exhibit large bias in their estimation of historical population size. We investigate the underlying parameters of the PSMC - specifically, the transition matrix of its HMM - to see if it can reveal any information about the presence of ancestral population structure. By analysing the pairwise distribution of time till coalescence in models of population structure and seeing how this compares with panmixia, we seek to reparameterise the transition matrix to be a function of population splits and migrations, as well as a history of population size changes. For inferring the demographic parameters that characterise the HMM, we take a novel approach by fitting a free transition matrix in the EM algorithm, as opposed to traditional methods that constrain their transition probabilities per iteration by the current set of population size change parameters.