Great work! This looks like a really interesting workflow, and is another nice point on the Pareto frontier.
Do you find that the AIMNet2 filtering of conformers to have a significant effect on the chosen conformers relative to the GFN2-xTB energies? My work in using GFN2-xTB conformational energies to screen catalysts has been promising, but I'm always wondering about using a generalized ML force-field to select conformers based on energy, as the GFN2-xTB energies are often lacking, but DFT energies are expensive.
Great question - we haven't rigorously tested AIMNet2 intermediate filtering vs. GFN2-xTB intermediate filtering, but practically speaking the AIMNet2 single-point energies are fast enough that it doesn't really make a difference. The slow steps are typically GFN2-xTB optimization and AIMNet2 reoptimization/frequencies, with RDKit steps starting to become slow for very large/flexible molecules too. AIMNet2 benchmarks better against GMTKN55, but there hasn't yet (to my knowledge) been a big benchmark study like Geoff Hutchinson's for AIMNet2 specifically. (ANI methods all do about the same as RDKit on the Hutchinson set, both in terms of time and accuracy.)
We hope to do conformer searching as a dedicated workflow in the future, so we'll probably revisit all this more intentionally later. For pKa there are enough other errors that it doesn't really seem to matter...
Great work! This looks like a really interesting workflow, and is another nice point on the Pareto frontier.
Do you find that the AIMNet2 filtering of conformers to have a significant effect on the chosen conformers relative to the GFN2-xTB energies? My work in using GFN2-xTB conformational energies to screen catalysts has been promising, but I'm always wondering about using a generalized ML force-field to select conformers based on energy, as the GFN2-xTB energies are often lacking, but DFT energies are expensive.
Great question - we haven't rigorously tested AIMNet2 intermediate filtering vs. GFN2-xTB intermediate filtering, but practically speaking the AIMNet2 single-point energies are fast enough that it doesn't really make a difference. The slow steps are typically GFN2-xTB optimization and AIMNet2 reoptimization/frequencies, with RDKit steps starting to become slow for very large/flexible molecules too. AIMNet2 benchmarks better against GMTKN55, but there hasn't yet (to my knowledge) been a big benchmark study like Geoff Hutchinson's for AIMNet2 specifically. (ANI methods all do about the same as RDKit on the Hutchinson set, both in terms of time and accuracy.)
We hope to do conformer searching as a dedicated workflow in the future, so we'll probably revisit all this more intentionally later. For pKa there are enough other errors that it doesn't really seem to matter...