Skip to main content Skip to navigation

Machine Learning and Computational Analysis

Primary Supervisor: Dr Peter Winn, School of Biosciences

Secondary supervisor: Professor Chris Thomas

PhD project title: Machine Learning and Computational Analysis

University of Registration: University of Birmingham

Project outline:

Application of sophisticated statistical techniques, including deep learning, have transformed our ability to model protein structure and function from protein sequence. Moreover, such techniques promise to provide us with tools to understand better how proteins function and to re-engineer them. This project aims to develop and apply such tools to multi-domain proteins and protein complexes. We are interested in moving towards the modelling of microbial proteomes, including the interactions between proteins. A stepping stone towards such a goal is provided by multi-domain protein systems such as the polyketide synthases and non-ribosomal peptide synthases. The project will thus involve the development and application of molecular modelling tools and statistical and machine learning techniques to understand better microbial proteomes and/or polyketide synthases (and similar multidomain proteins).

Polyketide synthases (PKSs) and non-ribosomal peptide synthases (NRPSs) are complex biological machines responsible for the production of many of our most important pharmaceuticals. Amongst these are the antibiotics penicillin and mupirocin, and the cholesterol reducing statins. However, such a short list of examples underplays their importance, with e.g. polyketides, the products of PKSs, having a multibillion dollar global market. Of particular interest to us is the polyketide antibiotic mupirocin, which is produced by a PKS and is important for the treatment of MRSA. We have a longstanding project to understand the protein machinery responsible for the production of mupirocin with the aim of re-engineering the pathway to produce novel variants with more potent antibiotic property. In particular, some MRSA strains are becoming resistant to mupirocin and we wish to develop strains that overcome that resistance.

NRPSs and PKSs comprise multiple modules, each module being rather like a workstation in a conventional factory assembly line. For both NRPSs and PKSs, the first module of the manufacturing process accepts a basic building block and passes it to the next module in the process, which extends the initial building block and modifies the resulting product before passing the polyketide/peptide to the next module for further extension and elaboration. This process is repeated until the final module of the assembly line releases the product into the cellular environment. The modules from which these molecular factories are constructed can be reordered and modified to produce a cornucopia of novel chemicals not easily accessible by conventional chemical synthesis techniques, as evidenced by the immense diversity of such compounds produced by nature. In the laboratory there have been some successes in manipulating both NRPSs and PKSs to produce novel compounds, but there have also been many failures, due to our sketchy understanding of the molecular processes at work. 

Our interest in PKSs, NRPSs and similar multidomain proteins lies not only in their commercial and medicinal importance, but also in the need for the multiple domain interactions that need to be controlled during substrate biosynthesis. E.g. type I modular PKSs typically consist of a few thousand amino acids per polypeptide chain, with polypeptide chains forming dimers or higher order structures. Interactions occur between domains within the same polypeptide chain and between domains on different polypeptide chains. For the polyketide to be produced correctly the domains have to process the substrate in the correct order, which is presumably regulated by communication between the domains.

BBSRC Strategic Research Priority:  Renewable Resources and Clean Growth: Industrial Biotechnology

Techniques that will be undertaken during the project:

  • Various statistical analyses including deep learning
  • Protein structural modelling

Contact: Dr Peter Winn, University of Birmingham