Author
Listed:
- Bin Pan
- Xiaoyu Hou
- Mingxin Zhang
- Jingxian Yu
- Conghui Zhang
- Yunhui Zhang
- Xiaolong Su
- Shuangcai Li
Abstract
Aqueous solubility, an essential physical property of compounds, has significant applications across various fields. However, verifying the solubility of compounds through experimental methods often requires substantial human and material resources. To address this issue, this study introduces the StackBoost model for predicting the solubility of organic compounds and systematically compares it with five well-known ensemble learning algorithms: Adaptive Boosting (AdaBoost), Gradient Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). The prediction results indicate that the StackBoost model excels in predicting aqueous solubility, achieving a coefficient of determination (R2) of 0.90, a root mean square error (RMSE) of 0.29, and a mean absolute error (MAE) of 0.22, significantly outperforming the other comparative models. Furthermore, this study further conducted high-throughput screening on large-scale datasets and successfully identified compounds with high potential for water solubility. Additionally, the model’s generalization ability is verified through transfer learning. Although the performance of the StackBoost model decreases when applied to different datasets, it still shows considerable transferability, making it a more generalizable prediction model for aqueous solubility.
Suggested Citation
Bin Pan & Xiaoyu Hou & Mingxin Zhang & Jingxian Yu & Conghui Zhang & Yunhui Zhang & Xiaolong Su & Shuangcai Li, 2025.
"A water solubility prediction algorithm based on the StackBoost model,"
PLOS ONE, Public Library of Science, vol. 20(8), pages 1-18, August.
Handle:
RePEc:plo:pone00:0330598
DOI: 10.1371/journal.pone.0330598
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0330598. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.