Author
Listed:
- Haoyang Zhang
- Muhammad Kabir
- Mauno Vihinen
Abstract
Protein deletions are frequent among both disease-causing and tolerated variants. Several mechanisms at the DNA, RNA and protein levels can lead to deletions. Many deletions are misclassified in the literature and databases, especially when the mRNA is degraded by the cellular quality-control mechanism. We developed a novel predictor for sequence retaining protein deletions, i.e., variants that do not alter the sequence downstream of the deletion site. We collected an extensive dataset of verified protein deletions, each described by a comprehensive set of context, content, position, and gene-based features. We evaluated both statistical and deep learning algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants are typically classified into two categories: either pathogenic or benign. However, there is always a third class of variants: variants of uncertain significance (VUSs), which have been ignored by all previous methods. PON-Del is the first deletion interpretation method that includes VUSs. It provides two outputs, binary and three-state prediction with VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.Author summary: Protein deletions are frequent among both disease-causing and tolerated variants, and are caused by several mechanisms at the DNA, RNA and protein levels. The reliable prediction of the effects of deletions is challenging. We developed a predictor for sequence retaining protein deletions, variants that do not alter the sequence beyond the deletion site. We collected an extensive dataset of verified protein deletions, and a comprehensive set of features to describe them. We evaluated seven algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants have typically been classified as pathogenic or benign. This practice misses the third category: variants of uncertain significance (VUSs). PON-Del is the first deletion interpretation method that includes VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.
Suggested Citation
Haoyang Zhang & Muhammad Kabir & Mauno Vihinen, 2026.
"PON-Del predictor for sequence retaining protein deletions,"
PLOS Computational Biology, Public Library of Science, vol. 22(2), pages 1-18, February.
Handle:
RePEc:plo:pcbi00:1014020
DOI: 10.1371/journal.pcbi.1014020
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014020. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.