Smarter Analogue Docking, Pocket Detection, and g-xTB Analytical Gradients
more robust MCS detection; conformer sampling with torsional Monte Carlo; better alignment and RBFE results; a new pocket-detection workflow; analytical gradients now available for g-xTB
Today we’re launching a significantly improved version of our analogue-docking workflow, a new pocket-detection workflow, and the new g-xTB method with support for analytical gradients.
Analogue Docking
Analogue docking allows users to quickly dock a series of related ligands against a protein: instead of docking each ligand separately against the pocket, analogue docking uses a user-provided template to generate ligand conformations, yielding well-aligned poses.
While our earlier analogue-docking workflow worked well in many cases, we struggled to robustly generate well-aligned poses for non-trivial modifications or complex scaffolds. Our new analogue-docking release performs significantly better for real-world RBFE datasets, approaching the performance of manual pose alignment without requiring any external chemical intuition. This means that downstream FEP runs are better converged, complete faster, and produce more accurate results, helping us get closer to our goal of fully automated end-to-end RBFE workflows at scale.
What We Changed
Rowan’s analogue-docking workflow begins by identifying the maximum common substructure (MCS) between the template pose and the new ligands. The MCS is then used to constrain the new ligand’s pose, ensuring good alignment between the two structures. In our previous MCS implementation, differences in stereochemical annotations or Kekulization confused the MCS finder, preventing the workflow from finding chemically reasonable poses. We’ve fixed this and added fallback logic to ensure we find a reasonable MCS even in confusing situations.
We’ve also substantially improved the logic used to generate new pose conformations. The previous implement relied exclusively on RDKit’s ConstrainedEmbed() function, which often struggled to generate diverse conformer ensembles given heavily constrained molecules. We’ve added a post-embedding torsional-Monte-Carlo conformer search step powered by openconf, which dramatically expands the number of poses generated.
Benchmarking Known Pose Retrieval
We benchmarked these improvements against the “JACS” RBFE test systems from Wang et al. 2015, each of which comprises a set of nicely aligned ligands bound to a protein target. For each system, we chose a single ligand as a template, input the remaining ligands in SMILES form, and ran analogue docking. The changes to analogue docking led to significantly improved overlap for all ligands versus our old analogue docking workflow.
Here’s the result of the old analogue-docking workflow, with messy pose alignment and generally poor overlay:
Here’s the result of the new analogue-docking workflow, with nearly perfect overlay and lower RMSD & docking scores.
We assessed the overall quality of the ensemble by comparing the heavy-atom RMSD of the best-aligned pose for each analogue to the pose reported in the JACS benchmark set. Across the board, the new analogue-docking workflow generates a significantly better match to the reported hand-aligned structures. The old algorithm sometimes found entirely different binding modes (RMSD >2 Å). The new approach nearly eliminates this, with poses almost always within 1 Å RMSD.
Benchmarking Downstream RBFE
Finally, we used all three sets of poses as inputs to RBFE calculations and compared the performance of the resultant runs. For the analogue-docked structures, we used the outputted pose with the best docking score. While the poses generated by the old analogue-docking workflow (”Old”) often produced results that were dramatically worse than results generated with the reported poses (”JACS”), the new analogue-docked poses (”New”) generally approached or even exceeded the accuracy of the reported poses. Additionally, the new poses often resulted in much faster RBFE runtimes than the old structures (e.g. the BACE and Thrombin sets).
Here’s the full table of our results. The highest-performing results in each category within a row are bolded.
(A few details: the "Old" results for the "Thrombin" set include only 10 ligands instead of 11 because one pose was so poorly aligned that the RBFE workflow could not calculate a ΔG for that structure, and all MCL1 results are missing one troublesome ligand that defied both “Old” and “New” analogue-docking protocols.)
Here are a few of the more striking comparisons:
Analogue docking remains available to all Rowan users.
Pocket Detection
If you’re starting a project with a new protein target and don’t yet have a bound crystal structure, one of the first challenges is figuring out where compounds should bind. Traditionally, this means inspecting the protein by hand: looking at the surface shape, electrostatic potential maps, or hoping that someone has already reported observed binding sites in the literature. That approach can work well, but it often depends on strong prior hypotheses, existing structural data, and a fair amount of manual effort.
Another option is blind docking: letting the docking software search the entire protein surface for binding regions automatically. This can generate pocket hypotheses when little is known about a target, but it’s a blunt instrument: docking software wasn’t designed for pocket detection, so the results are noisy and still require manual curation.
Rowan now includes a pocket-detection workflow that identifies potential binding pockets using a purpose-built geometric algorithm. Our workflow uses Pocketeer, a code from Charlie Harris that builds on ideas from fpocket. Pocketeer analyzes the geometry of a protein structure by placing spheres around the protein surface, determining which regions are sufficiently buried, clustering those regions into candidate pockets, and scoring each pocket using geometric and volumetric features.
To run a pocket-detection workflow, simply upload a protein structure or add one using its PDB code, perform any preparation steps (removing waters, rebuilding unresolved residues, and so on), and hit “Submit.”
The workflow runs in seconds. Detected pockets can be visualized directly in Rowan and quickly resubmitted into docking or batch docking workflows.
g-xTB with Analytic Gradients
g-xTB is a semiempirical tight-binding method derived from density-functional theory (DFT). Thanks to physics-informed approximations, such as range-separated approximate Fock exchange, g-xTB is orders of magnitude faster than DFT while having an accuracy similar to range-separated hybrid DFT with a triple-ζ basis set (check out our blog “The ‘Charlotte’s Web’ of Density-Functional Theory (DFT)“ for a deeper explanation of range-separation). Unlike neural network potentials, g-xTB does not require large amounts of fitting data and thus generalizes well across the periodic table, spin multiplicities, weird geometries, and domains of chemistry.
We already use the preliminary release of g-xTB in our bond-dissociation energy and pKa workflows, but the addition of analytic gradients opens up new possibilities in geometry optimization and property prediction like IR. Analytic gradients make it simple to run transition-state optimizations, such as this 95-atom hydrocupration reaction.
There will be slight deviations in the energies between the old and new versions; the updated version is slightly more accurate. Direct comparisons between results using different versions should be avoided, but g-xTB is fast enough that any calculations can easily be re-run.
















