Data analysis is a key component of what we do in SBIDER. It involves extracting from any available data the information that we need in order to better understand the drivers of diseases. This in turn allows us to make forward prediction under various scenarios, and therefore to provide an evidence base for disease control interventions.
Most of the data analysis we do is based on explicit models of how the diseases evolve and spread over the course of time. We use either deterministic or stochastic models, depending on what is most appropriate for the problem at hand. Models can never capture the full complexity of what goes on in nature, nor should there be over-simplistic, so that model design is often a balancing act between these two extremes.
Bayesian Parameter Inference
Our models typically include a number of parameters that are initially unknown (or partially unknown) but which can be learnt once the model is fitted to the data. This parameter inference step often requires the use Bayesian methodology in order to combine the different sources of information about the parameter value, including any previous information from previous studies. This Bayesian model fitting is often impossible to do analytically and requires the use of Monte-Carlo techniques, in particular Markov Chain Monte-Carlo, Sequential Monte-Carlo, data augmentation and combinations of these methods.
Genomics is a particularly new source of data that has emerged over the past few years, and genomic epidemiology is progressively becoming a complementary approach to traditional epidemiology. Extracting information from the very large datasets produced by the genomic approach remains an important challenge, but one that we are working towards.