Skip to main content

Deep Learning and Molecular Modelling for Predicting the Structure of Protein-Protein Interactions: Applications to Fundamental and Applied Biology.

Principal Supervisor: Dr. Peter Winn, School of Biosciences

Co-supervisor: Prof. Christopher Thomas, School of Biosciences

PhD project title: Deep Learning and Molecular Modelling for Predicting the Structure of Protein-Protein Interactions: Applications to Fundamental and Applied Biology.

University of Registration: University of Birmingham

Project outline:

Technologies for determining protein structures are platforms underpinning much of modern biotechnology, synthetic biology, drug discovery and medicine. A fast and cheap technique for determining structure from amino acid sequence would be a disruptive technology that accelerates these fields. Structure prediction in the last five years has developed to a point where it can predict novel folds, i.e. the structures of proteins for which no structural homologue has been solved. Progress has been driven by improvements in predicting residues that contact each other in the three-dimensional structure, based on patterns in a multiple sequence alignment. Recent work predicted 137 novel folds, of which five have been verified, but required approximately 13 000 cpu hours/structure (Ovchinikov et al 2017)

The rapid improvement in this area has been driven by improved methods for analysing multiple sequence alignments to determine residues that are coupled during evolution, and thus to identify possible contacting residues. In particular, recent improvements have been driven by developments in the field of deep learning.

The Winn group have been developing and applying such methods, with a particular interest in predicting the structures of multidomain proteins and complexes, such as found in natural product biosynthesis. E.g the polyketide synthases are a family of mega-Dalton synthases responsible for many of the blockbuster drugs on the market today including the statins, and many antibiotics. Modular polyketide synthases have thousands of amino acids and consist of functional domains grouped into modules, each module extending the growing polyketide by two carbons, and performing further chemical modifications before transferring the polyketide to the downstream module along the polypeptide, for further elongation and modification. There is great interest in being able to re-engineer polyketide synthases to make variants of existing drugs, e.g. to produce antibiotics that overcome antibiotic resistant bacteria, or to make novel chemical compounds. However, experiments to re-engineer polyketide synthases have had limited success and this is thought to be due to poor understanding of the structure and dynamics of these proteins.

More broadly, understanding how proteins fold, evolve and interact with each other and their substrates is a fundamental question connected to how cells are organised and function, thus, applying these methods of protein structure prediction might be useful not only for understanding systems such as polyketide synthases for the purpose of re-engineering them for synthetic biology, but also for addressing fundamental questions of cellular structure, organisation and function.


  • S. Ovchinnikov, et al. Science 2017, 355, 294-298;

  • Haines et al. Nature Chemical Biology, 9, 685–692, (2013). Shuangxi Ji, Tugce Oruc, et al Submitted.


BBSRC Strategic Research Priority: Industrial Biotechnology and Bioenergy

Techniques that will be undertaken during the project:

  • Sequence alignment.

  • Homology modelling.

  • Ab initio modelling.

  • Machine learning.

Contact: Dr. Peter Winn, School of Biosciences