Skip to main content Skip to navigation

Deep learning of reaction barriers for high-throughput retrosynthetic drug design

machine learning activation energy prediction

Supervisors: Prof. Reinhard J. Maurer (Chem./Phys.), Prof. Scott Habershon (Chem.)

Summary:

The drug discovery pipeline involves the screening of many molecules before viable leads are identified. This involves screening for their pharmacological properties, but also for their synthetic viability. Typical drug molecules can contain up to 100 non-hydrogen atoms, which makes the development of cost-effective and efficient synthetic pathways very challenging. Therefore, high-throughput screening of drug-like molecules needs to also consider their synthetic viability. The aim of this project is to develop a deep learning and generative design toolchain to accurately predict chemical reaction barriers that will advance chemical retrosynthetic design workflows.

Background

In the exploratory phase of drug discovery, millions of molecules are screened for their viability as drugs. This involves screening for their pharmacological properties, but also for their synthetic viability. Typical drug molecules can contain up to 100 non-hydrogen atoms, which makes the development of cost-effective and efficient synthetic pathways very challenging. Effective retrosynthetic design requires the ability to predict accurate reaction enthalpies and activation free energies for relevant intermediates. While quantum chemical predictions typically can provide sufficient accuracy of prediction (~1kcal/mol error), they are not feasible at the scale of millions of predictions per day. The need to predict the transition state structure as input for quantum chemical barrier predictions adds further complications. Machine learning (ML) models of quantum chemistry can achieve fast and accurate predictions, [1] but comprehensive data sets for reaction barriers of large molecules simply do not exist.

Several recent works have tried to tackle the scarcity of data on reaction barriers by creating new curated data sets [2, 3], but data for large molecules remains scarce. Furthermore, entropic and solvent effects will play a crucial role in reactions of large drug molecules and need to be considered. Graph-based reaction discovery and generative machine learning [4] provide a path to new synthetic data that can form the basis for a large-scale database of reaction enthalpies and activation free energies for realistic molecules. [5,6]

Project Aims

In this project, the student will develop a deep learning and generative design toolchain to accurately predict chemical reaction barriers without recourse to transition state structures and quantum chemical calculations at the point of prediction. This will enable the development of more accurate and advanced synthesis planning.

Project Objectives

  • Develop a workflow to iteratively build a massive database of reaction transition states and barriers for pharmaceutically relevant molecules (up to 100 heavy atoms)
  • Selectively use quantum chemistry to predict activation energies and activation free energies, as well as solvent effects that can act as starting point for machine learning models for activation energy prediction.
  • Systematically explore approaches for active learning and transfer learning of activation energies from small molecules to large molecules
  • Integrate the developed methodology into synthesis planning models. Evaluate predicted routes and yields on synthesis planning benchmarks.

Skills that the student will acquire:

  • Machine learning methods (deep learning, generative machine learning)
  • Quantum chemistry and atomistic molecular simulation methods
  • Optimization and kinetic reaction network discovery
  • Software development in Python with machine learning stacks
  • The project is in collaboration with a leading pharmaceutical company and will involve an extended industrial placement

Relevant references:

[1] Mathias Schreiner et al, NeuralNEB—neural networks can find reaction paths fast, Mach. Learn.: Sci. Technol. 3, 045022 (2022).

[2] Grambow et al., Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry, Scientific Data 7, 137 (2020), https://www.nature.com/articles/s41597-020-0460-4

[3] Schreiner et al., Transition1x - a dataset for building generalizable reactive machine learning potentials, Scientific Data 9, 779 (2022), https://www.nature.com/articles/s41597-022-01870-w

[4] Westermayr et al., High-throughput property-driven generative design of functional organic molecules, Nature Computational Science 3, 139-148 (2023)

[5] Zhao et al., Comprehensive exploration of graphically defined reaction spaces, Scientific Data 10, 145 (2023), https://www.nature.com/articles/s41597-023-02043-z

[6] Axelrod, S. and Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data 9, 185 (2022). https://doi.org/10.1038/s41597-022-01288-4

Are you interesting in applying for this project? Head over to our Study with Us page for information on the application process, funding, and the HetSys training programme

At the University of Warwick, we strongly value equity, diversity and inclusion, and HetSys will provide a healthy working environment, dedicated to outstanding scientific guidance, mentorship and personal development.

HetSys is proud to be a part of the Engineering Department which holds an Athena SWAN Silver award, a national initiative to promote gender equality for all staff and students.