# Biological Understanding

## David Rand's work in this area

##### Dynamical signalling systems, especially NF-kappaB

I am particularly interested in signalling systems that display dynamic behaviour. There are many such systems of great importance and an open question is whether the dynamics is functional and enables the systems to do things that would be difficult or impossible in equilibrium systems.

An exemplar is the NF-kappaB system which is one of the most important stress response pathways in the mammalian cell. My work in this area is almost all in collaboration with Mike White (Manchester). Most of our work is concerned with understanding the function of the oscillations in the NF-κB system, whereby the transcription factor NF-κB locates in and out of the nucleus in a periodic fashion when the system is activated. These oscillations were discovered in Mike's lab, initially in cell cultures but now in primary cells as well.

My basic hypothesis in this area is that NF-κB acts as an information hub with the oscillations allowing it to carry much more information than would be possible otherwise. I am particularly interested in developing a conceptual framework for understanding the transmission of information in signalling systems. In a recent lecture at the Royal Society on the Great Ideas of Biology, Sir Paul Nurse speculated that if we want to understand how biological organisms work then we need to explain the higher-order phenomenon of living organisms by relating the chemical and physical processes to the processing of information and the way that information is used to determine outputs. He quoted Francis Crick in his argument that it is better in biology to follow the flow of information than those of matter or energy. Moreover, he specifically highlighted the importance of dynamics because dynamical networks can transfer much more information. As he said, thinking of biology as an organised system processing information is an embryonic endeavour whose pursuit will crucially need the help of physicists and mathematicians. The problem is that we are missing an adequate conceptual framework for discussing genomic information once it has been passed into the stochastic dynamic interactions that make up cellular processes.

I am preparing a paper with George Minas and Dan Woodcock that provides the mathematical background to this hypothesis. For this theory we needed a new stochastic approximation of biological oscillators that addresses the need for both analytical tools and also algorithms for accurate and fast simulation and estimation of stochastic oscillator systems. A recent paper presents our method, called phase-corrected LNA (pcLNA), that overcomes the main limitations of the standard Linear Noise Approximation (LNA) to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-κB system are used to illustrate our results.

###### NF-kappaB papers here

My research in this area is very much concerned with understanding the design principles of circadian clocks. What are their roles and how is this reflected in their structure? I would add that since the clock is a dynamical system with a very complex molecular structure and since the role of the clock is clearly multifacited, this cannot be answered without the use of mathematics. My collaborators are listed here.

The circadian clock adapts organisms to the environmental day-night cycle both behaviourally and physiologically. In animals, not only are complex behaviours such as sleep and mood governed by this oscillator, but also different body functions such as digestion, circulation, and respiration. The basic mechanism of this clock is cell-autonomous in all studied species possessing a circadian clock i.e. each cell contains a network of genes and proteins that can generate an oscillator with a period of approximately 24 hours. These oscillators can, in turn be entrained, to the relevant 24-hour environmental day-night cycle.

A recent paper Phase-locking and multiple oscillating attractors for the coupled mammalian clock and cell cycle which appeared in PNAS is about the interaction between the mammalian circadian clock and the cell cycle. In the absence of other signals, the cell cycle and circadian clock robustly phase-lock each other in a 1:1 fashion so that in an expanding cell population the two oscillators oscillate in a synchronised way with a common frequency. However, there are additional clock states: as well as the low-period phase-locked state there are distinct coexisting states with a significantly higher period clock and a different frequency ratio.

Click the arrow start button to see the video showing 1:1 locking between the mammalian circadian clock and cell cycle. This video shows the temporal progression of clock and cell-cycle phases for unstimulated cells in 15% Fetal bovine serum. We show the clock phase on the horizontal axis, illustrated by a bar on the top that shows the relative clock marker level. The vertical axis shows the progression of the cell cycle. The colored bar on the right-hand side illustrates the relative levels of the cell cycle markers (black to red in G1, and grey to yellow in S–G2–M). We also mark the G1–S transition and cell division as horizontal yellow lines. Cells are drawn as blue dots (turning gray once they become confluent) that move from the bottom to the top as they progress through the cell cycle, and from the left to the right according to their clock phase (in this diagram measured as the normalized time between two clock peaks). When they are connected it means that they were the two offspring of a cell that divided during the last circuit. In the background, we show an estimated vector field that indicates the mean direction cells are taking at each point in this phase space. On the sides, we show density estimates for the fraction of cells in each phase. We see that most cells follow a main stream through the middle of the image, crossing the G1–S transition and cell-division lines at a distinct mean clock phase each. Moreover, we observe that some cells skip: They leave the main stream of cells because they progress through the cell cycle phase at a slower speed and rejoin the other cells once they arrive at the main trajectory again. Note that we connect sibling cells by a dashed line when possible. The video was made by Peter Krusche. This video shows the temporal progression of clock and cell-cycle phases for unstimulated cells in 15% Fetal bovine serum. We show the clock phase on the horizontal axis, illustrated by a bar on the top that shows the relative clock marker level. The vertical axis shows the progression of the cell cycle. The colored bar on the right-hand side illustrates the relative levels of the cell cycle markers (black to red in G1, and grey to yellow in S–G2–M). We also mark the G1–S transition and cell division as horizontal yellow lines. Cells are drawn as blue dots (turning gray once they become confluent) that move from the bottom to the top as they progress through the cell cycle, and from the left to the right according to their clock phase (in this diagram measured as the normalized time between two clock peaks). When they are connected it means that they were the two offspring of a cell that divided during the last circuit. In the background, we show an estimated vector field that indicates the mean direction cells are taking at each point in this phase space. On the sides, we show density estimates for the fraction of cells in each phase. We see that most cells follow a main stream through the middle of the image, crossing the G1–S transition and cell-division lines at a distinct mean clock phase each. Moreover, we observe that some cells skip: They leave the main stream of cells because they progress through the cell cycle phase at a slower speed and rejoin the other cells once they arrive at the main trajectory again. Note that we connect sibling cells by a dashed line when possible. The video was made by Peter Krusche.

This underlies one of my current main interests which is to do with the role of the circadian clock in the onset and progression of cancer. My work in this area is with Frances Levi, Robert Dallmann, Barbel Finkenstädt and Annabelle Ballesta.

The cell-autonomous circadian clockwork (left) is the functional unit of the CTS, and determines the complex interaction with xenobiotic metabolism. (Left) Simplified core circadian oscillator and one output relevant for xenobiotic metabolism through control of aminolevulinic acid synthase 1 (ALAS1), constitutive androstane receptor (CAR) and cytochrome P450 oxidoreductase (POR). (Right) The CTS involves a central hypothalamic pacemaker – the supra-chiasmatic nuclei (SCN) – which coordinates clocks in all the cells in the body through the generation of an array of physiological rhythms such as rest-activity, body temperature and hormonal secretions. The SCN synchronises the peripheral clocks relative to each other and to the environmental time cues provided by the day-night and social cycles (blue box). Following exposure of an organism to a xenobiotic, the substance undergoes the classical Absorption, Distribution, Metabolism and Elimination (ADME) processes. All of these processes, which ultimately determine the toxicity or pharmacologic effect of the xenobiotic are regulated by peripheral and central clocks present in the gut, heart and blood vessels, liver and pancreas, as well as kidney and colon. Xenobiotics can also reset the molecular clock or CTS through direct interference with the molecular clock or by altering or disrupting physiological pathways.

We are taking a systems approach to chronotherapeutics of cancers, aiming at (i) scaling up and adjusting detailed in vitro chronopharmacology models of the relevant drugs and combinations to the whole organism level, (ii) validating them in mice, and (iii) adjusting them for their implementation in cancer patients. Amongst our aims are the following: (1) To determine the extent of heterogeneity in molecular clocks of cancer cells and circadian coordination in tumour tissues and its impact on (i) cancer progression and (ii) chronotherapy optimization, including combination with clock-targeted pharmacologic or behavioural interventions. (2) To characterise and quantify chronofitness, a novel multidimensional and dynamic CTS biomarker aiming at the prediction of optimal tolerability outcomes on cancer chronotherapy; to maintain/restore/induce chronofitness, through CTS-targeted pharmacologic or behavioural interventions; to determine optimal timing of relevant anticancer drugs in relation to circadian biomarkers within chronofit organisms. (3) To compute and to develop precision chronotherapy delivery algorithms for selected drug combinations in chronofit organisms, through a multiscale systems pharmacology approach integrating heterogeneity in cancer cell clocks, in order to jointly improve anticancer treatment tolerability and efficacy.

##### Development

This is a relatively new area for me but I have always been interested in it. My current work arises from my interactions with Eric Siggia and my research student Elena Camacho Aguilar. There are three main areas:

###### I. Morphogenetic modelling

In groundbreaking work Corson and Siggia (Corson & Siggia, (2012) PNAS, 109(15):5568-5575) introduced a new approach to the modelling of a a much-studied system, vulval development in the nematode Caenorhabditis elegans. Elena Camacho Aguilar and I have developed this approach further by using the classification theorems of catastrophe theory to select these models from robust universal unfoldings so reducing the ad hoc nature of the Corson and Siggia model and providing a rigorous basis for the reduction in both parameters and state variables enabled by this approach. We have also developed new statistical methods based on ABC MCMC to fit the model to experimental data.

###### II. From stochastic cells to ''deterministic'' populations and patterns.

The key question I am working on is how does a population of cells which are so highly stochastic interact so as to produce an almost deterministic developmental pattern. An obvious hypothesis is that a key component of any answer is that the role of signals from other cells play a crucial role in taming this stochasticity. However, how this works is far from clear. One key task for this part of the grant is \emph{the development of a proper stochastic approach to the characterisation of stochastic development states and the effectiveness of regulatory and signalling networks in enforcing them}. This will involve the introduction of an information theoretic approach. As part of this we will also need to develop a more analytical approach to stochastic models as in Minas & Rand (2017) PLoS Comput Biol 13(7): e1005676.

The consideration of approaches like this requires the \emph{development of a combined dynamical systems and information theoretic approach} to development in which one understands the dependence of the cell state distributions $P(xs)$ upon not only the signals $s$ but also network structure, noise level, and the dynamic range of mRNA and protein concentrations used by the cell. Moreover, it is crucial that any stochastic theory handles dynamics and bifurcations because differentiation is characterised by dynamical transitions between different stable or metastable states as the signalling environment of cells change. All this requires the further development of stochastic approaches as described below (and above) for both modelling and statistical analysis.

##### Some Recent Methodological Developments
###### pcLNA: Long-time analytic approximation of large stochastic oscillators: simulation, analysis and inference

In order to analyse large complex stochastic dynamical models such as those studied in systems biology there is currently a great need for both analytical tools and also algorithms for accurate and fast simulation and estimation. We present a new stochastic approximation of biological oscillators that addresses these needs. Our method, called phase-corrected LNA (pcLNA) overcomes the main limitations of the standard Linear Noise Approximation (LNA) to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-kappaB system are used to illustrate our results.

Left. Comparison of pcLNA and exact transversal distributions for the Drosophila circadian clock (Ω = 300). Right. Exact empirical distribution of the fluctuations δt in Drosophila circadian clock system size Ω = 300. See Minas & Rand (2017) PLoS Comput Biol 13(7): e1005676
###### Inferring transcriptional logic from multiple dynamic experiments

The availability of more data of dynamic gene expression under multiple experimental conditions provides new information that makes the key goal of identifying not only the transcriptional regulators of a gene but also the underlying logical structure attainable. We propose a novel method for inferring transcriptional regulation using a simple, yet biologically interpretable, model to find the logic by which a set of candidate genes and their associated transcription factors (TFs) regulate the transcriptional process of a gene of interest. Our dynamic model links the mRNA transcription rate of the target gene to the activation states of the TFs assuming that these interactions are consistent across multiple experiments and over time. A trans-dimensional Markov Chain Monte Carlo (MCMC) algorithm is used to efficiently sample the regulatory logic under different combinations of parents and rank the estimated models by their posterior probabilities. We demonstrate and compare our methodology with other methods using simulation examples and apply it to a study of transcriptional regulation of selected target genes of Arabidopsis Thaliana from microarray time series data obtained under multiple biotic stresses. We show that our method is able to detect complex regulatory interactions that are consistent under multiple experimental conditions. Availability: Programs are written in MATLAB and Statistics Toolbox Release 2016b.

Left. Posterior inference of the regulatory for the TRS model of the target gene ANAC092. Right. Posterior inference of the TRS model of the target gene SCL3.

See G. Minas D. J. Jenkins D. A. Rand B. Finkenstädt. Inferring transcriptional logic from multiple dynamic experiments Bioinformatics, btx407, https://doi.org/10.1093/bioinformatics/btx407

###### ReTrOS

Given the development of high-throughput experimental techniques, an increasing number of whole genome transcription profiling time series data sets, with good temporal resolution, are becoming available to researchers. The ReTrOS toolbox (Reconstructing Transcription Open Software) provides MATLAB-based implementations of two related methods, namely ReTrOS–Smooth and ReTrOS–Switch, for reconstructing the temporal transcriptional activity profile of a gene from given mRNA expression time series or protein reporter time series. The methods are based on fitting a differential equation model incorporating the processes of transcription, translation and degradation. The toolbox provides a framework for model fitting along with statistical analyses of the model with a graphical interface and model visualisation. We highlight several applications of the toolbox, including the reconstruction of the temporal cascade of transcriptional activity inferred from mRNA expression data and protein reporter data in the core circadian clock in Arabidopsis thaliana, and how such reconstructed transcription profiles can be used to study the effects of different cell lines and conditions. The ReTrOS toolbox allows users to analyse gene and/or protein expression time series where, with appropriate formulation of prior information about a minimum of kinetic parameters, in particular rates of degradation, users are able to infer timings of changes in transcriptional activity. Data from any organism and obtained from a range of technologies can be used as input due to the flexible and generic nature of the model and implementation. The output from this software provides a useful analysis of time series data and can be incorporated into further modelling approaches or in hypothesis generation.

Left. Example output summary from ReTrOS-Switch applied to mRNA microarray data. Data is from a microarray time-series for the ELF4 clock gene in wild-type Arabidopsis plants (PRESTA project) [12]. Panel a) shows the raw input data (blue circles, extremes of replicates shown in shaded region and median shown with line), the fitted mRNA expression (black line) and the estimated switch events. Panel b) shows the baseline-removed estimated switch time probability density (black line) and the estimated switch times Gaussian mixture model (blue line, with shaded red/green region indicating μ ± 1.96σ). Panel c) shows the accepted switch time samples, along with the switch events (red/green). The burn-in period of the chain is shown by the dashed black line. Panel d) shows the accepted samples from the precision (parameter chains in blue and a histogram of the accepted model size. Summaries of the parameter chains are shown in black (median and lower/upper quartiles) and red. Right. Example output summary from ReTrOS-Switch applied to protein data. Data is luciferase tagged protein for the LHY clock gene in wild-type Arabidopsis plants (ROBuST Project) [15]. Panel a) shows the raw input data (blue circles, extremes of replicates shown in shaded region and median shown with line), the fitted protein expression (black line) and the estimated switch events (μ ± 1.96σ as vertical red/green lines). Panel b) shows the back-calculated mRNA expression profiles (black line) with the estimated switch times. Panel c) shows the baseline-removed estimated switch time probability density (black line) and the estimated switch times Gaussian mixture model (blue line, with shaded red/green region indicating μ ± 1.96σ). Panel d) shows the accepted switch time samples, along with the switch events (red/green). The burn-in period of the chain is shown by the dashed black line. Panel e) shows the accepted samples from the precision (σ2), δm and δp parameter chains in blue and a histogram of the accepted model size. Summaries of the parameter chains are shown in black (median and lower/upper quartiles) and red (μ and 1.96σ). If mRNA expression data is used, specific output panels are removed accordingly.

See Minas et al. BMC Bioinformatics (2017) 18:316 DOI 10.1186/s12859-017-1695-8

###### PeTTSy : a computational tool for perturbation analysis of complex systems biology models.

PeTTSy is a comprehensive tool for analysing large and complex models of regulatory and signalling systems. It allows for simulation and analysis of models under a variety of environmental conditions and for experimental optimisation of complex combined experiments. With its unique set of tools it makes a valuable addition to the current library of sensitivity analysis toolboxes. We believe that this software will be of great use to the wider biological, systems biology and modelling communities

Mirela Domijan, Paul E Brown, Boris V Shulgin and David A Rand. PeTTSy : a computational tool for perturbation analysis of complex systems biology models. BMC Bioinformatics 2016 17:124 DOI: 10.1186/s12859-016-0972-2