# Methodological Developments

Underpinning our medical and biological work is the development of new methodologies for statistics, mathematics, and computer science. The examples below are illustrative of the type of high impact work we undertake in developing new methodologies.

### Data Analysis

One of the most fundamental issues is to extract meaning from complex biological data. Work in the Zeeman Institute has pioneered multiple areas including:

**Quantitative image analysis** which has lead to the development of three open-source software packages: QuimP for correlating cortical cell fluorescence with membrane movements; LineageTracker a multi feature cell tracker; and CellTracker which is specifically dedicated to measuring periodic nuclear-cytoplasmic translocations of transcription factors.

**Bioinformatics**, in particular in the application of Bayesian statistical machine learning techniques to problems in systems biology, functional genomics and proteomics. This work uses genome sequences and other forms of high-throughput data to understand fundamental biological processes, but the wealth of data generated requires the development of sophisticated mathematical and computational tools.

**Biological and Epidemiological data** is often confounded by noise and biases in detection; bespoke statistical techniques are often required to extract the underlying signals.

### Machine Learning

Machine Learning is an example of a statistical method that aims to extract an informative signal from complex (high-dimensional) and often noisey data. We have used this technique in three main settings.

**Early cancer detection** is essentially a prediction task, on the basis of the available data. We are involved in a project to develop a universal blood test for cancer, by integrating hundreds of different kinds of blood-based cancer biomarkers using machine learning algorithms.

**Electronic noses** are devices which can measure a profile of volatile organic compounds (VOCs), which are produced by the body and vary in response to disease, thus giving a distinctive 'smell' that characterises that disease. The data are highly complex, and we are working to develop better ways to extract the structure, leading to improved ability to correctly diagnose the presence or absence of a given disease.

**Data integration** is a key task in modern medical research, due to the increasing ease with which multiple modalities can be measured. We have for a number of years now been developing methods which provide a principled statistical framework for integrating highly heterogeneous data sources.

### Model Development

**Mathematical models**allow us to translate biological knowledge at one scale and predict the behaviour at another. For example the behaviour of neurons (or networks of neurons) from the underlying bio-physics, or the population level spread of infection from the behaviour of infected individuals. As such models for a key aspect of all work within the Zeeman Institute, and highlight our ethos of blending biological knowledge with cutting-edge mathematics

**Simulation models**provide a robust and accurate method of making realistic predictions about future events. The aim is to use all available current data to provide accuate forecasts, often to inform policy. Many of the simulation models developed in the Zeeman Institute are spatial, such as those that describe the spread of infection through human and animal populations.

**Bespoke approximations**allow us to understand the dynamics of biological processes using simple models that capture the fundamental processes, but still allow some level of analytical understanding. As an example, researchers in the Zeeman Institute are known for their work of network approximations, which capture the effects of network structure through simple equations.

### Matching Models and Data

It is often said that "models are only as good as the data that supports them"; and this is a view that is echoed in all of the work at the Zeeman Institute. Much of our research involves the matching of complex models to complex data. In particular, we often use sophisticated Bayesian (MCMC) techniques to infer parameter values for our models. We are also interested in how parameter sensitivity translates into qualitative changes in model behaviour, espeically in relation to policy predictions.