Author
Listed:
- Lijuan Wang
- Yuze Wang
- Chen Qiu
- Liwei Xiao
- Xianliang Liu
- Junjie Chen
Abstract
Protein sequence design for tailored functional properties is a fundamental task in protein engineering, with critical applications in drug discovery and therapeutic development. Efficient navigation of the combinatorial vastness of protein sequence space to identify functional variants remains a formidable challenge. Conventional approaches, which predominantly rely on template-based local search or single-residue mutagenesis, are constrained by their susceptibility to local optima and their potential risk of destabilizing native structural stability. In this study, we introduce ProtHMSO, a heuristic multi-site optimization framework leveraging masked protein language models (ProtLMs) for context-aware sequence exploration. ProtHMSO mimics natural evolutionary mechanisms by employing ProtLM-derived substitution probabilities to guide heuristic searches for synergistic mutations, thereby constraining combinatorial search spaces through evolutionary and biophysical priors. ProtHMSO is further applied to replace the exploration strategies in genetic algorithms (GAs) and Monte Carlo tree search (MCTS) for improving their convergence efficiency. Benchmark experiments demonstrate that protein sequences generated by ProtHMSO exhibit superior functional performance and closer alignment with natural sequence distribution, compared with state-of-the-art methods. These advancements highlight that ProtHMSO has strong potential and compatibility to accelerate functional protein discovery, offering a robust framework for efficient and context-aware exploration of protein sequence space.Author summary: To address the challenge of efficiently discovering functional new proteins in protein engineering due to the vast sequence space, and to overcome the limitations of traditional evolutionary algorithms that rely on blind random mutagenesis, resulting in inefficiency and prone to structural destabilization, we proposed a heuristic multi-site optimization framework, ProtHMSO. Its core concept is to leverage the powerful contextual prediction capabilities of masked protein language models (such as ESM-2) to guide sequence mutagenesis. By predicting amino acid substitutions at specific sites that are consistent with evolutionary laws and biophysical priors, ProtHMSO narrows the exploration scope from the vast combinatorial space to a small number of high-potential candidate sequences, achieving intelligent and efficient optimization of protein sequences. Furthermore, ProtHMSO is not just a standalone algorithm, but also a plug-and-play enhancement module. By integrating it into a genetic algorithm (GA) and a Monte Carlo tree search (MCTS), it replaces the random mutation operator in the former with its intelligent mutation and guides the tree expansion process in the latter. This enables these classic optimization algorithms to break free from the blindness of exploration and achieve faster convergence and better results, demonstrating the wide applicability and great potential of this framework in improving the performance of tools in the entire field of computational protein design.
Suggested Citation
Lijuan Wang & Yuze Wang & Chen Qiu & Liwei Xiao & Xianliang Liu & Junjie Chen, 2026.
"Heuristic multi-site optimization for protein sequence design using Masked Protein Language Models,"
PLOS Computational Biology, Public Library of Science, vol. 22(6), pages 1-22, June.
Handle:
RePEc:plo:pcbi00:1014365
DOI: 10.1371/journal.pcbi.1014365
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014365. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.