Author
Listed:
- Alexander Kroll
- Sahasra Ranjan
- Martin J Lercher
Abstract
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.Author summary: Understanding how proteins interact with small molecules, such as drugs, is critical to advancing medical, biological, and biotechnological research. Our work introduces ProSmith, a machine learning framework that improves the prediction of protein-small molecule interactions. Protein-small molecule interactions can be predicted by using numerical representations of proteins and small molecules as input to machine learning prediction models. Previous methods typically generated separate numerical representations for the proteins and small molecules without considering their interactions. ProSmith, however, combines both protein sequence and small molecule structural information in the input of a single multimodal Transformer Network to generate a joint numerical representation. Unlike previous methods, this allows for a comprehensive exchange of information between protein and small molecule, capturing the complex relationships and interactions between these two types of molecules. ProSmith successfully predicts several biological interactions, including kinase inhibitions, potential enzyme-substrate pairs, and enzyme kinetic parameters KM. We provide Python code that can be easily adapted to improve predictions for any protein-small molecule interaction.
Suggested Citation
Alexander Kroll & Sahasra Ranjan & Martin J Lercher, 2024.
"A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships,"
PLOS Computational Biology, Public Library of Science, vol. 20(5), pages 1-23, May.
Handle:
RePEc:plo:pcbi00:1012100
DOI: 10.1371/journal.pcbi.1012100
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012100. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.