Machine learning of energy barriers for reaction network discovery of drug-like molecules
Figure: Generative machine learning combined with neural network prediction of transition state structures and barriers provides the means to generate large-scale drug molecule databases for retrosynthetic drug design
Machine learning of energy barriers for reaction network discovery of drug-like molecules
In drug discovery, millions of molecules need to be screened for their viability as drug candidate, including their synthetic viability. Yields of chemical reactions are often limited by the formation of unforeseen by-products, which are not accounted for in synthesis planning.
The exploration of kinetically accessible by-products requires the accurate prediction of reaction enthalpies and activation free energies for all relevant intermediates. In this project, a deep learning and generative design toolchain will be developed resulting in an ML model of reaction barriers.
This will enable the development of more accurate and advanced high-throughput reaction network discovery and by-product prediction.
Supervisors
Primary: Prof. Reinhard Maurer, Chemistry
Prof. Scott Habershon, Chemistry
A transcript of the video is available by clicking this link - transcript opens in another windowLink opens in a new window
Background
Typical drug molecules can contain up to 100 non-hydrogen atoms, which makes the development of cost-effective and efficient synthetic pathways very challenging. Effective retrosynthetic design requires the ability to predict accurate reaction enthalpies and activation free energies for relevant intermediates. While quantum chemical predictions typically can provide sufficient accuracy of prediction (~1kcal/mol error), they are not feasible at the scale of millions of predictions per day. The need to predict the transition state structure as input for quantum chemical barrier predictions adds further complications. Machine learning (ML) models of quantum chemistry can achieve fast and accurate predictions, [1] but comprehensive data sets for reaction barriers of large molecules simply do not exist.
Several recent works have attempted to tackle the scarcity of data on reaction barriers by creating new curated data sets. [2, 3] However, these datasets only feature molecules up to 7 heavy atoms. Even though activation free energies and thermochemistry data might be available for small molecules, the complexity of large chemical reactions means that entropic contributions become even more relevant, particularly for bimolecular reactions. Alternative approaches are graph-based molecule reaction space sampling and generative machine learning (ML) as they provide a path to new synthetic data that can form the basis for a large-scale database of reaction enthalpies and activation free energies for realistic molecules. [4,5]
Project Aims
In this project, the student will develop a deep learning and generative design toolchain to accurately predict chemical reaction barriers without recourse to transition state structures and quantum chemical calculations at the point of prediction. This will enable the development of more accurate and advanced retrosynthetic design workflows. The project is in close collaboration with a leading pharmaceutical company and will involve an additional six-month industrial placement of the PhD student extending the overall project to 4.5 years.
Project Outcomes
- Develop a workflow to iteratively build a massive database of reaction transition states and barriers for pharmaceutically relevant molecules (up to 100 heavy atoms)
- Selectively use quantum chemistry to predict activation energies and activation free energies, as well as solvent effects that can act as starting point for machine learning models for activation energy prediction.
- Systematically explore approaches for transfer learning of activation energies for small molecules to large molecules, for example model distillation.
Skills that the student will acquire:
- Machine learning methods (deep learning, generative machine learning)
- Quantum chemistry. Electronic structure theory, and atomistic molecular simulation methods
- Kinetic reaction network discovery
- Software development in Python with machine learning stacks
- The project is in collaboration with a leading pharmaceutical company and will involve 6 month extended industrial placement
Relevant references:
[1] Mathias Schreiner et al, NeuralNEB—neural networks can find reaction paths fast, Mach. Learn.: Sci. Technol. 3, 045022 (2022).
[2] Grambow et al., Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry, Scientific Data 7, 137 (2020), https://www.nature.com/articles/s41597-020-0460-4
[3] Schreiner et al., Transition1x - a dataset for building generalizable reactive machine learning potentials, Scientific Data 9, 779 (2022), https://www.nature.com/articles/s41597-022-01870-w
[4] Zhao et al., Comprehensive exploration of graphically defined reaction spaces, Scientific Data 10, 145 (2023), https://www.nature.com/articles/s41597-023-02043-z
[5] Axelrod, S. and Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data 9, 185 (2022). https://doi.org/10.1038/s41597-022-01288-4