Tautomer and Conformer Workflows Now Available
importance of tautomers and conformers; how it works; benchmarks
We’ve just launched two new workflows on Rowan: one to enumerate and score the various tautomers of a molecule, and one to conduct conformational searches.
This matters because molecules are complicated. Many molecules have exchangeable protons which could potentially reside on different atoms, giving rise to “tautomers” like those shown above. (Can you guess which one is observed in solution?) Furthermore, almost all molecules can take on a variety of different shapes, termed conformers. To understand how a molecule will behave, it’s important to know what where the protons will be and what its shape will be, which makes computational prediction of conformers and tautomers very important. (Here’s a paper on tautomerism in virtual screening—many such cases.)
Similar to our pKa workflow, our tautomer workflow uses AIMNet2 gas-phase free energies and GFN2-xTB/CPCM-X(water) solvation free energies to quickly generate decently accurate relative energy predictions. To increase our accuracy even further, we apply a linear correction to the computed ∆G values, which brings us to roughly state-of-the-art quantum chemical accuracy1 on the TautoBase dataset.
Below is a plot of computed vs experimental ∆G for TautoBase tautomer pairs, where red denotes pairs with a ∆G of less than 3 kcal/mol. The correlation is clearly imperfect, but so are quantum chemical methods—at least ours is fast!
What’s more practically useful than getting ∆G quantitatively correct (regression) is predicting which tautomer will be favored (categorization). Our workflow predicts the correct tautomer in the pair 89% of the time here. When we only look at the close-in-energy pairs in red, the accuracy is 77%, which is lower but still pretty decent. (For comparison, guessing randomly would give you a 50% success rate at this test.) We anticipate that the accuracy of workflows like this will only increase as machine-learned interatomic potentials improve.
It’s really easy to run a tautomer search with Rowan—just input your structure and click “submit.” The result of a tautomer search on the 4-hydroxypyrimidine from the top of the page is shown below (link). Structure 2 is predicted to be lowest in energy, which matches the literature.
We’ve also added separate functionality for conformational searching in Rowan, which we’ve already been running as a part of other workflows. You can choose between four modes for conformational searches: “careful” and “meticulous” use CREST, while “rapid” and “reckless” use the RDKit ETKDG algorithm. For most systems, “rapid” is appropriate, but sometimes the more methodical metadynamics-based approach of CREST is superior. (You can read more about the options in our documentation.)
Here’s the result of a conformer search, showing that Rowan gets the anomeric effect for α-chlorotetrahydropyran roughly right:2
We hope these new workflows are useful: try them out and let us know!
Chodera and co-workers scored standard DFT methods against Tautobase and got comparable errors, albeit without scaling. (For example, they got an RMSE of 3.1 using B3LYP/aug-cc-pVTZ//B3LYP/6-31G(d)/SMD.)
See Alabugin’s review for a discussion of this system.