Summer Speed Optimizations
making rowan faster: approximate guess Hessians; new compute backends, &c
When we meet with Rowan users to discuss what they like and don’t like about our platform, one complaint comes up over and over again: speed. Bluntly, Rowan’s been a lot slower than traditional high-performance computing centers ever since we launched. While this might have been forgivable back when we were just trying to throw something together, it’s now become a significant barrier to Rowan’s continued utility. Demos can be slow—products cannot.
Simulations can be accelerated either by using better hardware or by finding smarter algorithms. While both strategies can work, it’s often more cost-effective (and satisfying) to find algorithmic speedups: a recent perspective by Frank Neese observed that since 2005, algorithmic improvements to ORCA have led to a 20–200x speedup, while hardware improvements have only led to a 7x speedup. We’ve had success with both strategies, as we’ll share below.
ML “Guess” Hessians for TS Optimization
One of the biggest speed bottlenecks in Rowan has been transition-state optimizations. Unlike regular optimizations, which start with an approximate Hessian matrix (of second-order partial derivatives) and update it as they go along, transition-state optimizations typically require a relatively accurate “guess” Hessian before starting the actual optimization. Since computing a numerical Hessian takes 6N gradient calculations for an N-atom system, this meant that any transition-state search done at a high level of theory would “hang” for hours computing endless gradients before the optimization could even start.
Fortunately, there’s an alternative. Approximate Hessians have been used for a while to speed up regular optimizations: Berny Schlegel used a set of empirical rules to generate decent guess Hessians, and semiempirical methods have also been used. These methods aren’t generally used for transition-state optimizations since relatively accurate force constants are needed. But recent work from Eric Yuan and co-workers shows that neural network potentials can generate Hessian matrices that are accurate enough for subsequent transition-state optimizations, which prompted us to investigate whether we could use NNPs to provide Hessians accurate enough for a TS optimization.
There are potential pitfalls here: the approximate Hessian generated by a NNP will always be less accurate than the exact Hessian, which will lead to optimizations with more steps (potentially fine) or optimizations that miss the TS altogether (bad). To try this out, we tried using an AIMNet2 Hessian to find five different transition states at the B97-3c level of theory:
a model Sn2 reaction: 9 steps with exact Hessian vs 32 steps with AIMNet2, 29% faster overall with AIMNet2
a Diels–Alder cycloaddition: 4 steps with exact Hessian vs 40 steps with AIMNet2, 63% faster with AIMNet2
rotation of the amide bond in acetamide: 6 steps with exact Hessian vs 13 steps with AIMNet2, 78% faster with AIMNet2
addition of water to isocyanic acid: 8 steps with exact Hessian vs 15 steps with AIMNet2, 70% faster with AIMNet2
an Alder ene reaction: 27 steps with exact Hessian vs 50 steps with AIMNet2, 65% faster with AIMNet2
(n.b.: links show the correct transition states but the number of steps is slightly different—exact job parameters are different)
In every case, the correct transition state was identified, and the imaginary frequencies computed with an exact Hessian were the same: optimizations with AIMNet2 Hessians took more steps but were significantly faster overall, with the observed speedups increasing for larger systems. (We also tried GFN2-xTB Hessians: these took more steps for some systems, but also converged on the correct TSs.)
Where possible, AIMNet2 guess Hessians are now the default for all optimizations run on Rowan; otherwise, xTB guess Hessians are used. We’re very excited to launch this on Rowan, and expect that this approach will only become more powerful as NNPs become more accurate.
Better Backends For Running Jobs
On the hardware side of things, we’ve put some time and effort into improving the performance of the servers that we use to run calculations. Running calculations in the cloud gives us virtually limitless control over what computers we want to use: we found that switching from Intel Xeon-series processors to 4th-generation AMD EPYC processors provided a substantial performance boost without unduly increasing cost per job, and were able to optimize some other parts of our backend to give a total speedup of about 8x.
With both of these improvements, a typical transition-state optimization might now be more than 20x faster than it was last week. We’re always interested to hear about how we can make our software faster and better: please reach out if you notice something that’s slow and should be fast, and we’ll do our best to fix it!
Thanks to many Rowan users for helpful discussions here.