Challenge 2: Machine Learning Approaches to Modelling Chemical Reaction Networks

Johnson Matthey Challenge

Challenge 1: Accurate prediction of chemical reaction rates

One can often generate complex chemical reaction networks (CRNs) - comprising the set of reactive molecular species and elementary reaction steps – either “by hand” (using prior knowledge of related chemical systems) or using automated reaction discovery methods. With a CRN in hand, there exist powerful microkinetic modelling approaches that enable one to study the time-evolution of species concentrations over experimentally accessible time-scales.

However, the accuracy of these simulations directly reflects the accuracy of the underlying calculated reaction rates for each elementary step in the CRN. Most commonly, transition-state theory (TST) is used to calculate reaction rates based on the calculated activation free energy; but TST is based on important assumptions about the reaction dynamics (i.e. no re-crossing assumption) and is exponentially sensitive to uncertainties in the calculated activation energies.

Machine-learning tools offer a route to addressing this by directly learning reaction rates (or activation energies) from existing computational or experimental datasets. However, the current uncertainties in typical activation energy predictions from such models is around 3-6 kcal/mol - large enough to mean that calculated reaction rates using TST are significantly in error.

In this challenge, we will explore new approaches to calculate reaction rates; this could focus on development of better descriptors for ML models, or alternative approaches based on approximate chemical dynamics simulations. Large datasets (experimental or computational) of activation energies or reaction rates are available – but how can these be validated and best used to boost accuracy of rate predictions?

Challenge 2: Solvent effects

A related challenge lies in the treatment of solvent; this is a perennial challenge in CRN modelling. It is well-known that solvent can have a significant impact on chemical reaction rates and product selectivity, but current implicit models are often inadequate in capturing these effects. Furthermore, explicit solvent models are often prohibitively expensive – especially in the setting of large complex CRNs with many elementary reaction steps and reactive species.

So, what is the best way to account for solvent in reaction-rate predictions? Can we use ML to “learn” solvent effects on reactions with sufficient accuracy? Or can we classify reactions to identify when solvent effects are important and when they can be ignored?

Challenge 3: Assess the Merits of Pre-trained “Foundation” models for Heterogeneous Catalysis

A further challenge in making the use of chemical reaction networks applicable to real-world catalysis is the extra difficulty associated with reactions taking place on the surface of solid catalysts. Naturally, adding a periodic slab of a crystalline surface (or a step edge, a corner, a surface defect, et cetera) increases the size of the problem and the computational challenge of the corresponding electronic structure calculations. On top of this, the surface breaks translational and rotational symmetries present in vacuum or implicit solvent, meaning there is a greatly increased phase space to explore. This makes it even more important to try accelerated methods based on Machine Learned Interatomic Potential. A major question exists of whether pre-trained “Foundation Model” MLIPs such as the recently released MACE_MP potential, trained to infer interactions between 86 elements based on the a large database from the Materials Project, can help with this type of investigation: can it generate useful geometries, reaction pathways, vibrational properties etc, for further refinement with DFT?