Conference programme

Programme

Wednesday 29th June

13:00 - 13:50: Arrivals

13:50 - 14:00: Welcome

14:00 - 15:00: Plenary talk: Anastasia Ignatieva (Oxford): Large-scale genealogy reconstruction and genealogy-based inference

15:00 - 15:40: Annie Forster (Oxford): Investigating malaria parasite genome regions co-evolving to counteract the sickle cell resistance trait

15:40 - 16:20: Coffee break

16:20 - 17:00: Xinzhu Yu (Manchester): A suggested shared aetiology of dementia - a colocalization study

17:30: Poster reception

Thursday 30th June

09:30 - 10:10: Elizabeth Hayman (Oxford): Recoverability of Ancestral Recombination Graph Topologies

10:10 - 10:50: Coffee break

10:50 - 11:30: Yaoling Yang (Bristol): Using ancient DNA to understand Multiple Sclerosis

11:30 - 12:30: Plenary talk: Dario Spanò (Warwick): Duality and fixation for populations undergoing selection in a time-varying random environment

12:30 - 14:00: Lunch

14:00 - 15:00: Plenary talk: Gil McVean (Oxford): The estimation and use of genetic ancestry in human genomics

15:00 - 15:40: E. Castedo Ellerman (Fresh Pond Research Institute): Microscale estimation of admixture timing with stochastic processes of gametic lineages

15:40 - 16:20: Coffee break

16:20 - 17:00: Lino Ferreira (Oxford): Polygenic scores enable discovery of widespread genetic interactions associated with quantitative traits in the UK Biobank

19:00: Conference dinner

Friday 1st July

09:00 - 09:40: Dominic Zhou (Warwick): Adaptive MCMC Methods for Discrete Structures in Genetics

09:40 - 10:20: Michael Komodromos (Imperial): Variational Bayes for High-Dimensional Survival Analysis

10:20 - 10:50: Coffee break

10:50 - 11:30: Jasmin Rees (UCL): Identifying the nature of adaptation to micronutrients in modern humans

11:30 - 12:30: Plenary talk: Cornelia Pokalyuk (Goethe University Frankfurt): Invasion of cooperative parasites in moderately structured host populations

12:30 - 14:00: Lunch

Abstracts

Anastasia Ignatieva (University of Oxford): Large-scale genealogy reconstruction and genealogy-based inference

The problem of accurate large-scale genealogy reconstruction in the presence of recombination is notoriously difficult, but has seen significant recent progress with the development of several tools for inferring ancestral recombination graphs (ARGs) compatible with a given input dataset on a genome-wide scale. I will give an overview of the challenges and methods of genealogy reconstruction, and talk about recent work on adding a new sequence into an inferred reference genealogy quickly and accurately, allowing for better scalability and the possibility of adding new data without repeating computation.

Annie Forster (University of Oxford): Investigating malaria parasite genome regions co-evolving to counteract the sickle cell resistance trait

[Talk] Malaria parasites impart a strong selective force on the human genome, but whether human resistance mechanisms in turn shape parasite variation has historically been less well understood. However, it was recently discovered that genetic variants in three regions of the malaria parasite genome – termed Plasmodium falciparum sickle-associated (Pfsa)1-3 - are strongly associated with the human sickle haemoglobin (HbS) polymorphism. The underlying mutations have unusual population-genetic features, including between-locus linkage disequilibrium, which suggest they may be undergoing a form of co-evolutionary adaption within current parasite populations. We used genome sequence data from the MalariaGEN Pf6 resource of community-sampled infections from multiple countries, and from severe infections from The Gambia and Kenya, to investigate the sharing of haplotypes carrying Pfsa+ mutations. In particular, we developed a method to identify and call genomic structural variants at the Pfsa3 locus (on P. falciparum chromosome 11) revealing considerably structural diversity closely linked to regional single nucleotide polymorphisms. Our analysis reveals that the Pfsa+ haplotypes are extensively shared between African P. falciparum populations, but they have not reached fixation in any population suggesting stable maintenance of the Pfsa polymorphisms over a relatively long evolutionary timescale. A natural hypothesis is that HbS leads to positive selection of these haplotypes, but the full set of evolutionary forces operating on them is not known. Future work will be needed to uncover their biological function and potential medical relevance.

Xinzhu Yu (University of Manchester): A suggested shared aetiology of dementia - a colocalization study

[Talk] Introduction: Many health outcomes were associated with dementia while the underlying mechanism remained unclear. Identification of shared causal genes between dementia and its related clinical outcomes can help understand shared aetiology and multimorbidity surrounding dementia.

Methods: We performed the HyPrColoc colocalization analysis to detect possible shared causal genes between dementia or Alzheimer’s disease (AD) and five selected traits: stroke, diabetes, atherosclerosis, cholesterol level, and alcohol consumption within 601 dementia or AD associated genetic regions using summary results of the UK Biobank genome-wide association studies. Functional analysis was performed on the candidate causal genes to explore potential biological pathways.

Results: Rs150562240 in the LPIN3 gene was identified as a candidate shared causal variant across dementia, AD and atherosclerosis. Evidence for pairwise colocalization between dementia and stroke, dementia (or AD) and atherosclerosis, and dementia (or AD) and diabetes was found in two, six and two genetic regions respectively. Colocalization signals between diabetes and the other three non-dementia/AD traits were detected in five regions.

Conclusions: The colocalization evidence shown in our study suggested shared aetiology between dementia and related diseases such as stroke, atherosclerosis, and diabetes.

Elizabeth Hayman (University of Oxford): Recoverability of Ancestral Recombination Graph Topologies

[Talk] Ancestral recombination graphs (ARGs) are the extension of phylogenetic trees to include recombination, a powerful evolutionary process that shapes the genetic diversity of many species. The topology of this graph gives us important information on the evolution of a species, but algorithms to reconstruct an ARG from species data are often reliant on sample sequences carrying informative patterns of mutations. In this talk I will present exact results concerning the probability of recovering the true topology of an ARG under the coalescent with recombination and gene conversion. These expressions give us an indication of the uncertainty in reconstructed ARGs, and we see that for parameter values realistic for biological species (in particular SARS-CoV-2), the probability of reconstructing genealogies that are close to the truth is low. This is joint work with Anastasia Ignatieva and Jotun Hein (https://arxiv.org/abs/2110.04848).

Yaoling Yang (University of Bristol): Using ancient DNA to understand Multiple Sclerosis

[Talk] What can ancient DNA tell us about disease origins? Multiple Sclerosis (MS) is a lifelong autoimmune disease of the brain and spinal cord that is strongly associated with the Major histocompatibility complex (MHC) region on chromosome 6. This association with immune response motivates examining genetic risk over critical historical time periods in which population density, lifestyle, and pathogen exposure changed in Hunter-Gatherer, Farming and Nomadic Pastoral populations. But what methods are appropriate to analyse the association of risk and ancestry? We applied "chromosome painting" to describe modern (UK Biobank) participants in terms of 7 ancient ancestries with different lifestyles. We then performed a Genome-Wide Association Study (GWAS) of ancestry to examine its association with MS. To avoid the problems of making a genetic risk score in ancient individuals, we developed an approach that assigned risk to an ancestry. We also developed Linkage Disequilibrium of Ancestry (LDA), which quantifies the ancestry correlations of paired SNPs, and constructed the "LDA score" of each SNP to detect selection. Finally, we introduce HTRX, an extension to Haplotype Trend Regression (HTR), that includes single SNPs and non-contiguous haplotypes as features. HTRX can identify gene-gene interactions, explaining more than double the variance in MS than GWAS in some regions. We see that selection can be tied to specific ancient populations and that their lifestyle likely contributed to MS risk today, seemingly via side-effects of infectious disease resistance.

Prof. Gil Mcvean (University of Oxford): The estimation and use of genetic ancestry in human genomics

The analysis of genetic ancestry has become an important theme within human genetics, with applications ranging from the study of human history and evolution to the use of polygenic risk scores in clinical decision making. The explosion of genomic data, alongside the development of new tools for understanding what genetic data can (and cannot) tell us about ancestry, give us ever-finer insights into the structuring of genomic variation. I will give an overview of recent progress in this area, but also some of the challenges and open problems that remain.

Dr Dario Spanò (University of Warwick): Duality and fixation for populations undergoing selection in a time-varying random environment

I will discuss the properties of a family of Wright-Fisher-type model of population genetics where the selective fitness of an allele is assumed to be permeable to the influence of environmental factors which vary randomly in time. With random environment, the relationship between forward-in-time evolution of the allele frequency and the population's backward-in-time genalogy is described by a "quenched" notion of sampling duality. Under some conditions, ordinary moment duality can be recovered in the population's jump-diffusion scaling limit. Duality between allele frequency process and genealogy then allows to compare various selection regimes. In particular, duality helps understanding whether rare but stronger selective events have higher impact than mild, constant selective pressure.

E. Castedo Ellerman (Fresh Pond Research Institute): Microscale estimation of admixture timing with stochastic processes of gametic lineages

[Talk + poster] Admixture of populations has usually been modelled with simple pulse models assuming instantaneous gene flow. Estimation of complex admixture timing presents challenges, but is likely better for recent admixture and structured populations. I present some mathematical progress towards improving estimation of more realistic complex admixture scenarios. A formal definition of "lineal admixture time" achieves a number of benefits. It admits all realistic complex admixture scenarios and provides a link between physical events of the past and the mathematical machinery for improved estimation. One benefit of lineal admixture time is its ability to be tested against non-genetic and non-mathematical lines of evidence such as from archaeology, anthropology, geology and history. Another mathematical construct I will present is a stochastic process of gametic lineages with extra structure specific to the transmission of genetic information by a sexually reproducing population. In addition to evaluation via simulations, I also plan to evaluate with empirical data such as the samples from Barbados in the 1K Genomes dataset. Such island populations are good test cases for evaluating statistical inference due to a historical record and a straightforward admixture scenario.

Lino Ferreira (University of Oxford): Polygenic scores enable discovery of widespread genetic interactions associated with quantitative traits in the UK Biobank

[Talk] Understanding the connection between genotype and phenotype is a central goal of genetics. In recent years, a huge number of human genetic associations have been discovered and these have been used to predict various traits and diseases, with variable success. Genetic prediction of phenotypes has been dominated by simple additive polygenic scores (PGS) which ignore any interactions between loci (epistasis). Such interactions are expected based on both evidence from model organisms and evolutionary theory, where epistasis has been proposed as a relevant mechanism underlying phenomena such as speciation, genetic robustness or the trajectory of fitness landscapes. However, human examples of interactions are limited, with the vast number of potentially-interacting SNPs hampering discovery power. We developed an approach aiming to overcome this lack of power by testing for interactions between a single SNP and the combined effect of groups of other variants, for example those used in the PGS, and applied this to 97 quantitative traits in the UK Biobank. We are motivated by the concept of regulatory networks, where a SNP impacting the expression of a single gene can have downstream impacts on many others. Our approach provides robustness to false positives due to nonlinear additive effects, or locally clustered associations, and is initialised by iteratively constructing a PGS accounting for all significant linear signals of association. We find widespread, independent interactions: 260 loci across the genome interacting with the PGS of 59 traits. For example, the variant rs3131894 shows no direct association with waist circumference but changes the predictive power of the PGS for this trait; it is an eQTL for ZFP57, HLA-G and HLA-A and associates with coeliac disease risk. As a second example, we have identified an interaction between rs635634, an ABO eQTL, and eQTLs for ALPL and FUT2 affecting alkaline phosphatase levels; this interaction recovers a known biological relationship between ABO blood type, ABO secretor status and levels of this protein in blood. Future work will determine which biological components each interacting locus is modulating. These results may allow us to improve the performance – and thus potentially also the clinical utility – of PGS (by modelling non-linear terms) as well as their cross-population transferability (by identifying genetic determinants of the variability of PGS performance).

Dominic Zhou (University of Warwick): Adaptive MCMC Methods for Discrete Structures in Genetics

[Talk] The Kingman coalescent model underpins population genetics study, but performing inferences from its complex posterior distribution is computationally expensive. We apply adaptive MCMC algorithms to coalescent-based targets defined on a combinatorial state space consisting of both discrete tree topologies and continuous branch lengths. A geometric embedding, is used to construct an applicable state space for sampling and adaptation on the structure among discrete variables. In order to tackle the low efficiency of adapting large covariance matrix, algorithms are equipped with a class of Adapted Increasingly Rarely Markov Chain Monte Carlo. We then prove that a uniform prior on the mutation rate gives rise to an improper posterior, and that tempering schedules in parallel tempering algorithms can cause impropriety too. Another challenge we identify and prove rigorously is that the random walk Metropolis type algorithms on coalescent models will not be geometrically ergodic under the standard choice of prior distributions.

Michael Komodromos (Imperial College London): Variational Bayes for High-Dimensional Survival Analysis

[Talk] In recent years variational Bayes (VB) has presented itself as a viable alternative to MCMC, particularly in situations where scalability is key. We follow such developments and present a VB approximation to sparse high-dimensional Bayesian proportional hazards models. Within our VB approximation we utilise a mean-field spike-and-slab variational family, thereby offering mechanisms for variable selection, coefficient estimation and uncertainty quantification. We demonstrate the performance in a variety of simulation settings, as well as demonstrate applicability to real-world datasets.

Jasmin Rees (University College London): Identifying the nature of adaptation to micronutrients in modern humans

[Talk] Environmental pressures are strong drivers of adaptation and, when variable across a species range, they can lead to local adaptation. Such adaptation has occurred numerous times in human evolution, owing to the highly varied environments humans inhabit across the globe. The level of micronutrients (e.g., Iron, Zinc and Selenium) in the soil, and therefore diet, vary widely around the globe and is a prime example of such environmental variation. Micronutrients play an essential role in human health, with deficiencies compromise key stages of development and increase the risk of metabolic, cardiovascular and infectious diseases – making the dietary levels of micronutrients a strong local selective force in humans. Indeed, adaptation to some micronutrients has been identified in humans. However, there are many genes associated with micronutrient uptake and metabolism, suggesting that adaptation to micronutrients is likely polygenic in nature. Using simulations under realistic demographic histories and integrating tree-recording methods (Relate) and Fst, we demonstrate the power of a gene set approach to identify polygenic local adaptation. We applied these methods to sets of genes associated with 13 micronutrients, in 913 modern humans from 40 worldwide populations. We identify monogenic and polygenic signatures of positive selection in genes associated with the metabolism or uptake of micronutrients at both the local and global scales. These results demonstrate that micronutrient levels have driven adaptation across human history. I will discuss the evidence of polygenic adaptation at the local scale (e.g., Selenium) and the global scale (e.g., Zinc) to highlight the diversity of this adaptation.

Dr Cornelia Pokalyuk (Goethe University Frankfurt): Invasion of cooperative parasites in moderately structured host populations

Certain defense mechanisms of phages against the immune system of their bacterial host rely on cooperation of phages. Motivated by this example we analyse invasion probabilities of cooperative parasites in host populations that are moderately structured. More precisely we assume that hosts are arranged on the vertices of a configuration model and that offspring of parasites move to nearest neighbours sites to infect new hosts. We consider parasites that generate many offspring at reproduction, but do this (usually) only when infecting a host simultaneously. In this regime we identify and analyse the spatial scale of the population structure at which invasion of parasites turns from being an unlikely to an highly probable event. Joint work with Vianney Brouard (Lyon). A preprint is available at arXiv:2201.02249.

Alicia Gill (University of Warwick): Inferring reproduction number from genomic and epidemic data using MCMC methods

[Poster] Genomic data is increasingly being used to understand infectious disease epidemiology. Unfortunately, phylogenetic trees used to represent the genomic variation are not easy to relate with epidemiological processes such as the reproduction number R(t). We use Markov chain Monte Carlo methods (MCMC) to infer parameters of the epidemic using a dated phylogeny and various qualities of prevalence data. When we have accurate data of prevalence over time, then we show that the Metropolis-Hasting algorithm performs well to infer the parameters. When there is no prevalence data, we have implemented a pseudo-marginal MCMC to infer the parameters using only the phylogeny. The pseudo-marginal framework can also incorporate partial or noisy prevalence data, which results in improved inference when compared to only observing a dated phylogeny.

David Helekal (University of Warwick): Bayesian Inference of Clonal Expansions

[Poster] In microbial population genetics, clonal expansions are phenomenon where one individual gives rise to a subpopulation that persists for medium to long term. This can occur due to selective phenomena, such as emergence of a variant of that evades existing immunity, ecological phenomena, or be driven by geography, such as when a pathogen variant spreads to a previously unaffected continent. Analysis and quantification of this phenomenon is of great interest in microbiology. In this work we address whether clonal expansions can be inferred and identified in real-world phylogenies. We construct a simple heuristic model for clonal expansions and set up an MCMC scheme to perform Bayesian inference with. We then analyse tractability and performance of this modelling approach using simulated data, followed by analysis of several real world pathogen phylogenies.

Savita Karthikeyan (University of Oxford): Leveraging evolutionary genetics methods to understand the effects of rare variation in metabolism and improve polygenic risk score prediction

[Poster] To date, large-scale efforts in understanding complex trait variation have primarily focused on common variants. However, it is crucial to address rare variants as they can have larger effects on complex traits and are less confounded by linkage disequilibrium. From an evolutionary perspective, negative selection is an important source of rare variation. Several genome-wide scans of positive selection have been carried out, but negative selection remains understudied. The succinct tree sequence is a transformative data structure that encodes sequence data in terms of their evolutionary relationships and powers the analysis of millions of whole genomes. I’m interested in (i) developing tree sequence methodology to identify patterns of negative selection on the human genome and (ii) applying it to biobank-scale datasets to find causal rare variants for metabolic traits and improve risk prediction studies.

Aidan Pierce (University College London): Dynamics of Endosymbiosis

[Poster] The origin of the eukaryotic cell is an evolutionary oddity, having only occurred once in evolutionary history. An important step in the formation of the first eukaryote was the endosymbiosis between two prokaryotic cells, the host and the proto-mitochondria, which formed an endosymbiotic relationship resulting in the formation of an organelle (organellogenesis). This event enabled the first eukaryotes to enlarge their genome size and become the only organisms to explore complex morphological space. However, the genetic and energetic factors responsible for enabling and destabilising endosymbiotic relationships at the origin of eukaryotes remain largely unexplored. This PhD will use mathematical modelling and population genetics to understand what factors limit, drive, and maintain endosymbiotic events. In doing so it will provide a crucial understanding to a major transition in evolution which is vital to understanding the origin of complex life.

Ian Roberts (University of Warwick): Bayesian Inference under the Structured Coalescent

[Poster] The structured coalescent models the common ancestry of organisms sampled from a spatially structured population. A realisation of the process consists of a phylogenetic tree relating the samples alongside a migration history containing the geographic location of each ancestor. Current inference methods either attempt to simultaneously infer the phylogenetic tree and plausible migration histories, which is computationally expensive, or rely on approximations of the structured coalescent. I will present a Markov Chain Monte Carlo (MCMC) scheme which strikes a balance between these extremes by sampling migration histories (along with governing static parameters) for a fixed phylogenetic tree under the full structured coalescent model.

Silvia Shen (University of Edinburgh): Harnessing genome characterisation to uncover disease mechanisms

[Poster] Whilst genome-wide association studies (GWAS) have identified thousands of mutations associated with common diseases, interpreting these results to aid functional follow-up has been difficult. One way in which biological mechanisms underlying complex traits can be uncovered is by integrating different omics data to identify regions of the genome contributing to disease, which genes are more active in certain diseases and how such activity is controlled. Here, we harness genomic annotation data to inform variant prioritisation in the UK BioBank cohort. We develop non-parametric testing methods that take into account linkage disequilibrium between variants, and apply this method to the hypertension phenotype as an example of a common disease. We identify genomic annotations which are enriched and depleted in hypertension phenotypes. We then propose incorporating single-cell eQTL to identify variants contributing to transcription in tissues relevant to blood-pressure traits. Lastly, we examine heritability-based metrics of interpreting the enrichment results.

Kelsey Tetley-Campbell (University of Edinburgh): TL-GWAS: Let's remove unnecessary assumptions from population genetics.

[Poster] The prevailing aim of population genetics is estimating the effect of genetic variant on traits and identifying the biological mechanisms that play a role in this relationship. Current research into the effect of variants on traits relies heavily on parametric assumptions that can result in model-misspecification and lead to biased estimates- especially as cohorts become ever-larger leading to smaller variance but compounding the effect of bias. Here we introduce a workflow, TL-GWAS, that directly addresses these issues and can be used to estimate effect sizes of individual variants as well as epistatic interactions based on the theory of Targeted Learning. Targeted Learning is a powerful tool that offers mathematical guarantees of estimating the ground truth of the effect of a variant on a trait without unnecessary assumptions. TL-GWAS makes use of a library of both parametric and non-parametric algorithms to create a model independent estimator that considers population stratification as well as other confounders. The next step, the Targeted Maximum Likelihood Estimator (TMLE), guarantees that residual bias will be removed allowing for an accurate estimate of the ground truth of the effect size. By leveraging the mathematical guarantees of Targeted Learning, understanding of epistatic interactions and UK BioBank cohort we can estimate the effect size of variants on complex traits.