Skip to main content Skip to navigation

Deep learning of reaction barriers for high-throughput retrosynthetic drug design

im

Supervisors: Prof. Reinhard J. Maurer, Prof. Scott Habershon

Summary:

The drug discovery pipeline involves the screening of many molecules before viable leads are identified. This involves screening for their pharmacological properties, but also for their synthetic viability. Typical drug molecules can contain more than 100 non-hydrogen atoms, which makes the development of cost-effective and efficient synthetic pathways very challenging. Therefore, high-throughput screening of drug-like molecules needs to also consider their synthetic viability. The aim of this project is to develop a deep learning and generative design toolchain to create an accurate first principles database and ML model of chemical reaction barriers that will advance chemical retrosynthetic design workflows.

Background:

In the exploratory phase of drug discovery, millions of molecules are screened for their viability as drugs. This involves screening for their pharmacological properties, but also for their synthetic viability. Typical drug molecules can contain more than 100 non-hydrogen atoms, which makes the development of cost-effective and efficient synthetic pathways very challenging. Effective retrosynthetic design requires the ability to predict accurate reaction enthalpies and activation free energies for relevant intermediates. While quantum chemical predictions typically can provide sufficient accuracy of prediction (~1kcal/mol error), they are not feasible at the scale of millions of predictions per day. The need to predict the transition state structure as input for quantum chemical barrier predictions adds further complications. Machine learning (ML) models of quantum chemistry can achieve fast and accurate predictions, [1] but data sets for reaction barriers of large molecules simply do not exist.

Several recent works have tried to tackle the scarcity of data on reaction barriers by creating new curated data sets [2, 3], but data for large molecules remains scarce. Furthermore, entropic and solvent effects will play a crucial role in reactions of large drug molecules and need to be considered. Graph-based reaction discovery and generative ML provide a path to new synthetic data that can form the basis for a large-scale database of reaction enthalpies and activation free energies for realistic molecules. [4,5] This, would support the development of improved retrosynthesis models.

In this project, the student will develop a deep learning and generative design toolchain to create an accurate first principles database and ML model of reaction barriers that does not require recourse to transition state structures and quantum chemical calculations at the point of prediction. This will enable the development of more accurate and advanced retrosynthetic design workflows.

Project Objectives for the PhD project:

  1. Develop a workflow to iteratively build a massive database of reaction transition states and barriers for pharmaceutically relevant molecules (up to 100 heavy atoms)
  2. Selectively use quantum chemistry to predict activation energies and activation free energies, as well as solvent effects that can act as starting point for machine learning models for activation energy prediction.
  3. Systematically explore approaches for transfer learning of activation energies for small molecules to large molecules, for example model distillation.

Skills that the student will acquire:

  • Machine learning and high-dimensional parametric fitting
  • Electronic structure theory
  • Molecular dynamics simulations and open quantum system dynamics
  • Software development in Julia and Python programming languages

Relevant references:

[1] Mathias Schreiner et al, NeuralNEB—neural networks can find reaction paths fast, Mach. Learn.: Sci. Technol. 3, 045022 (2022).

[2] Grambow et al., Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry, Scientific Data 7, 137 (2020), https://www.nature.com/articles/s41597-020-0460-4

[3] Schreiner et al., Transition1x - a dataset for building generalizable reactive machine learning potentials, Scientific Data 9, 779 (2022), https://www.nature.com/articles/s41597-022-01870-w

[4] Zhao et al., Comprehensive exploration of graphically defined reaction spaces, Scientific Data 10, 145 (2023), https://www.nature.com/articles/s41597-023-02043-z

[5] Axelrod, S. and Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data 9, 185 (2022). https://doi.org/10.1038/s41597-022-01288-4

Are you interesting in applying for this project? Head over to our Study with Us page for information on the application process, funding, and the HetSys training programme

At the University of Warwick, we strongly value equity, diversity and inclusion, and HetSys will provide a healthy working environment, dedicated to outstanding scientific guidance, mentorship and personal development.

HetSys is proud to be a part of the Physics Department which holds an Athena SWAN Silver award, a national initiative to promote gender equality for all staff and students. The Physics Department is also a Juno Champion, which is an award from the Institute of Physics to recognise our efforts to address the under-representation of women in university physics and to encourage better practice for both women and men.