Starling: Macroscopic pKa, logD, and Blood–Brain-Barrier Permeability

microscopic vs. macroscopic pKa; Uni-pKa and Starling; microstate ensembles; logD and Kp,uu predictions

and

Apr 25, 2025

Starling, from *Svenska Fåglar Efter Naturen Och Pa Sten Ritade* (1929).

Today, we’re excited to launch a macroscopic-pKa-prediction workflow on Rowan for our subscribing users! This workflow uses physics-informed machine learning to quickly predict pKa values, logD, and blood–brain-barrier penetrance for drug-like small molecules. In parallel, we’re releasing a preprint describing our methodology (link, ChemRxiv), which we’ll briefly describe in this newsletter.

Predicting pK_a is something we’ve been thinking about for a while here at Rowan. Here’s what we wrote 14 months ago when we launched our AIMNet2-based pK_a-prediction workflow, which was also Rowan’s very first workflow:

Understanding a molecule’s pK_a is incredibly important: pK_a values dictate whether a molecule will be ionized or neutral at a given pH, which can be used to predict solubility, membrane permeability, blood–brain-barrier penetration, hERG toxicity, phospholipidosis risk, and much more. Accurate pK_a predictions help medicinal chemists design compounds with the desired physicochemical and pharmacological parameters, making pK_a calculation a problem of “immense interest” in computational chemistry.

We’re happy with our previous pK_a prediction, which is substantially different from and (we feel) complementary to the vast majority of DFT- or ML-based methods out there. It’s proven robust enough to be useful in a variety of contexts, like this excellent skeletal-editing work from the Levin group that we recently highlighted on X.

But this method isn’t right for every use case. It’s a microscopic pKa-prediction method, which makes comparing to experimental data difficult, and it’s also too slow to be convenient for large drug-like molecules. Recent work from Jonathan Zheng and co-workers also pointed out the danger of relying on microstate-only pK_a-prediction methods, and prompted us to think about how we could provide our users with an alternative.

In their paper, Zheng and co-workers highlight the Uni-pKa model from DP Technology as a potential solution to the problems they describe (emphasis added):

Uni-pKa, published in 2024… accounts for tautomerism, capturing the microscopic pKa of both the uncharged and zwitterionic tautomers. To our knowledge, this is the only recently released ML model that correctly distinguishes between those microstates.

Uni-pKa works by enumerating all relevant microstates, predicting each microstate’s aqueous free energy, and generating pK_a values from the differences between these free energies. The architecture and dataset are open-source, but the weights aren’t freely available. So we retrained our own lightweight Uni-pKa model, which we’re calling “Starling” to differentiate it from the original.

Starling excels at pK_a prediction, just like the original Uni-pKa model (see our preprint for precise values). But we’re able to do a lot more with the output free-energy values than just predict pK_a. We can predict isoelectric points and generate microstate populations as a function of pH, as shown here for glycine:

Microstate populations of glycine by pH (Figure 4 in our preprint)

We can also generate pH-dependent logD predictions that match experimental data pretty nicely (in some cases) by matching up per-microstate logP predictions with our Starling microstate populations. Here’s what this looks like for pentachlorophenol, a case where we have experimental data to compare to—at low pH, the phenol is protonated and lipophilic, but as the pH increases the anion predominates and prefers the aqueous phase.

LogD/pH profile of pentachlorophenol (Figure 10 in our preprint)

But we’re most excited about using Starling to predict blood–brain-barrier permeability. Previous work from Morgan Lawrenz and co-workers at Schrödinger showed that DFT-computed solvation energies were surprisingly predictive of K_p,uu, the unbound brain-to-plasma partition coefficient (and a “game-changing” metric for CNS therapeutics), but actually running all the DFT calculations can take days of high-performance computing time.

We’ve long hoped to create a fast version of this K_p,uu-prediction workflow using neural network potentials, but we’ve been stymied by the need for a fast and accurate macroscopic pK_a predictor. Lawrenz and co-workers use a mix of experimental and DFT-computed pK_a values to estimate the free-energy cost of neutralization at pH 7.4; now we’re able to use Starling to get the same correction, allowing us to build an accurate workflow that runs in minutes, not days. Here’s an ROC/AUC analysis for the task of predicting whether a compound will have K_p,uu above or below 0.3 (the same cutoff Lawrenz and co-workers use)—we get useful accuracy without any compound-specific fine-tuning.

We’re releasing our “macroscopic pKa” workflow today for all Rowan subscribers. This workflow predicts all of the properties described above—macroscopic pK_a values, microstate populations, isoelectric points, logD values, and K_p,uu values—through a single interface that makes complex chemical phenomena extremely intuitive. Here’s an overview of adenine’s microstates, for instance:

If you’re interested in bringing these powerful capabilities to your drug-discovery organization, please reach out (contact@rowansci.com) and we’ll be happy to talk!

Rowan Newsletter

Discussion about this post

Ready for more?