Benchmarking NNPs, Orb-v2, and MACE-MP-0

benchmarking as driver of systematic methodological improvement; our new benchmarking website; new NNPs on Rowan; GPU-based inference coming to more users

Jan 17, 2025

Progress in the space of neural network potentials (NNPs) is incredibly fast—there are new advances practically every month, and keeping track of what’s possible with NNPs is practically a full-time job. At Rowan, we’re committed to keeping up with the latest in NNP research and giving our users the best models possible for their use cases.

Image from Pavlo Dral’s presentation at VISTA

Today, we’re excited to be releasing a new website to help keep track of NNP performance as new models come out as well as adding two new NNPs—Orb-v2 and MACE-MP-0—to our platform.

Benchmarks

Good benchmarks are key to assessing the accuracy and robustness of computational methods, particularly in fast-moving areas like NNPs. With levels of theory like density-functional theory (DFT), it’s possible to run a few benchmarks on a level of theory and get a good sense of its overall performance across chemistries. With machine-learning-based methods like NNPs, benchmarking for each new category of system is necessary to ensure that the machine-learned models have “learned” that area of chemical space through its training.

As we’ve been thinking about how to make the best use of NNPs in simulation workflows, we realized that benchmarks are an incredibly important resource and something that demands more of our focus. So, we’re launching a benchmark site to make it easy for users to find high-quality benchmarks, compare different NNPs, and make smart choices for their own research.

The home page of Rowan Benchmarks, our new benchmark site

Our new benchmarking site is a revamped version of Ari Wagen’s NNP Arena. We’ve added a lot of things:

A summary dashboard displaying overall performance across accuracy benchmarks as well as speed benchmarks for both molecular and periodic systems
Two new high-accuracy molecular benchmarks: TorsionNet206 and the Folmsbee conformers benchmark
More DFT results for each benchmark

We’re actively working to add more benchmark results like Wiggle150, our new benchmark developed in collaboration with Joe Gair and his lab that measures relative energy predictions on a set of highly strained conformers. If you have ideas for how this can be improved, contact us at contact@rowansci.com!

Orb-v2

Orb-v2 is a neural network potential developed by Mark Neumann and coworkers at Orbital Materials. Orbital Materials is a startup developing NNPs and using them “to design and deploy advanced materials and hardware for earth's long-term habitability” (and we’re huge fans of their work!). They recently announced both a strategic partnership with AWS and their plan to deploy carbon removal systems into data centers by the end of this year.

The Orb NNP was released in September 2024 and was immediately followed by the improved v2 models (all of which are available on GitHub). Trained on the Materials Project MPtraj and Jonathon Schmidt’s Alexandria datasets of periodic DFT relaxations, Orb is built to power materials discovery and property prediction workflows.

Orb’s architecture predicts forces as model outputs, making it faster and more scalable than many other NNPs (see Filippo Bigi’s recent preprint about the relative merits of this approach). Since its release, Orb has already been used to study a number of systems that would be intractable or computationally intensive to study with traditional methods.

Siya Zhu and co-workers use Orb-v2 to perform phase diagram calculations, finding great agreement with the PBE functional implemented in VASP.

Tim Duignan studies the selectivity filter (SF) of the KcsA potassium ion channel.

While studying phonon-related properties, Antoine Loew and co-workers find that Orb-v2 can accurately optimize unit cell sizes relative to PBE reference data.

In their CHIPS-FF NNP benchmarking preprint, Daniel Wines and Kamal Choudhary demonstrate that Orb-v2 (and other NNPs) are able to recreate the radial distribution function of amorphous silicon.

You can now run both Orb-v2 and Orb-d3-v2 on Rowan! To run the dispersion-corrected Orb-d3-v2, set “Orb-v2” as your method and then select “D3” in the “Corrections” dropdown. As the benchmarks above show, we don’t recommend Orb-v2 for “classic” computational organic chemistry problems (barrier heights, geometry optimizations, etc)—for those problems, AIMNet2 is significantly more accurate.

MACE-MP-0

MACE-MP-0 is a neural network potential from Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M. Elena, Dávid P. Kovács, and Janosh Riebesell as well as dozens of collaborators. The MACE-MP-0 preprint, “A foundation model for atomistic materials chemistry,” is a tour de force that explores dozens of use cases for NNPs like MACE-MP-0.

Figure 2 of the preprint, for example, demonstrates MACE-MP-0’s generally good agreement with PBE on modeling aqueous systems in a variety of phases:

The paper also highlights its applicability to catalyst design (section 2.2), metal-organic framework modeling (section 2.3), hydrogen combustion (section A.8), materials structure searching (section A.15), molten salts (section A.20), and much more—if you’re interested, we’d recommend just playing around with the model! Just like Orb-v2, MACE-MP-0 is expected to be somewhat less accurate than AIMNet2 for pure organic chemistry. If you’re interested in working with Rowan to develop a custom solution using NNPs for your research, please reach out to us.

You can now run both MACE-MP-0b2 (Large) (with optional D3BJ correction) on Rowan! The pre-trained MACE-MP-0 models are on GitHub and can be run through the MACE interface.

GPU Inference On Rowan

For small systems, the Rowan platform is currently running NNPs (AIMNet2, MACE, Orb, and OMat24) on CPU machines to reduce latency. For larger systems, we’re in the process of rolling out GPU inference to all Rowan users (previously only available to subscribed users) so that everyone can experience the blazing speed of ML-accelerated simulation on large systems. Keep an eye out for future updates from our team!

Rowan Newsletter

Discussion about this post