Author
Abstract
Accurate binding affinity prediction (BAP) is crucial to structure-based drug design. We present PATH+, a novel, generalizable machine learning algorithm for BAP that exploits recent advances in computational topology. Compared to current binding affinity prediction algorithms, PATH+ shows similar or better accuracy and is more generalizable across orthogonal datasets. PATH+ is not only one of the most accurate algorithms for BAP, it is also the first algorithm that is inherently interpretable. Interpretability is a key factor of trust for an algorithm and alongside generalizability, which allows PATH+ to be trusted in critical applications, such as inhibitor design. We visualized the features captured by PATH+ for two clinically relevant protein-ligand complexes and find that PATH+ captures binding-relevant structural mutations that are corroborated by biochemical data. Our work also sheds light on the features captured by current computational topology BAP algorithms that contributed to their high performance, which have been poorly understood. PATH+ also offers an improvement of 𝒪 (m + n)3 in computational complexity and is empirically over 10 times faster than the dominant (uninterpretable) computational topology algorithm for BAP. Based on insights from PATH+, we built PATH−, a scoring function for differentiating between binders and non-binders that has outstanding accuracy against 11 current algorithms for BAP. In summary, we report progress in a novel combination of interpretability, speed, and accuracy that should further empower topological screening of large virtual inhibitor libraries to protein targets, and allow binding affinity predictions to be understood and trusted. The source code for PATH+ and PATH− is released open-source as part of the OSPREY protein design software package.Author summary: Predicting how strongly a small molecule (ligand) binds to a protein is a fundamental challenge in drug discovery. Recently, deep learning methods have shown promise in this task. However, we find that many of these models suffer from overfitting, meaning they perform well on their training data but fail to generalize to new datasets. This is concerning because practical drug discovery requires models that work well beyond their training set. Additionally, most previous algorithms—including both deep learning and traditional methods— overestimate binding affinity and predict that most protein-ligand pairs interact favorably, when in reality the vast majority of molecules do not bind to their targets at all. To address these challenges, we introduce PATH+, a new algorithm that encodes structural binding features using persistent homology, a mathematical tool from algebraic topology. Our persistence fingerprint efficiently captures geometric properties such as molecular cavities and interaction patterns at multiple scales. PATH+ significantly outperforms previous affinity prediction methods on unseen data while being interpretable—meaning predictions can be traced back to specific atomic interactions. Additionally, we develop PATH-, a scoring function that improves discrimination between true binders and non-binders. Finally, we provide a provably accurate algorithm that improves the efficiency of persistent homology computations by a cubic factor, making PATH ten times faster than previous topology-based methods. Our work advances both computational topology and in silico drug discovery, improving accuracy, efficiency, and interpretability in binding affinity prediction.
Suggested Citation
Yuxi Long & Bruce R Donald, 2025.
"Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology,"
PLOS Computational Biology, Public Library of Science, vol. 21(6), pages 1-26, June.
Handle:
RePEc:plo:pcbi00:1013216
DOI: 10.1371/journal.pcbi.1013216
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013216. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.