Author
Listed:
- Yan Fang
(Fujian Provincial Key Laboratory of Data-Intensive Computing, Fujian University Laboratory of Intelligent Computing and Information Processing, School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou 362000, China)
- Yonghua Lin
(Fujian Key Laboratory of Financial Information Processing, Key Laboratory of Applied Mathematics in Fujian Province University, Putian University, Putian 351100, China)
- Chuanbo Huang
(Fujian Province University Key Laboratory of Computational Science, School of Mathematical Sciences, Huaqiao University, Quanzhou 362000, China)
- Zhaowen Li
(Fujian Key Laboratory of Financial Information Processing, Key Laboratory of Applied Mathematics in Fujian Province University, Putian University, Putian 351100, China)
Abstract
A critical step for gene selection algorithms using rough set theory is the establishment of a gene evaluation function to assess the classification ability of candidate gene subsets. The concept of dependency in a classic neighborhood rough set model plays the role of this evaluation function. This criterion only notes the information provided by the lower approximation and omits the upper approximation, which may result in the loss of some important information. This paper proposes gene selection algorithms within a single-cell gene decision space by employing self-information, taking into account both lower and upper approximations. Initially, the distance between gene expression values within each subspace is defined to establish the tolerance relation on the cell set. Subsequently, self-information is introduced through the lens of tolerance classes. The relationship between these measures and their respective properties is then examined in detail. For gene expression data, the proposed self-information metric demonstrates superiority over other measures by accounting for both lower and upper approximations, thereby facilitating the selection of optimal gene subsets. Finally, gene selection algorithms within a single-cell gene decision space are developed based on the proposed self-information metric, and experiments conducted on 10 publicly available single-cell datasets indicate that the classification performance of the proposed algorithms can be enhanced through the selection of genes pertinent to classification. The results demonstrate that F i − S I achieves an average classification accuracy of 93.7% (KNN) while selecting 48.3% fewer genes than Fisher’s score.
Suggested Citation
Yan Fang & Yonghua Lin & Chuanbo Huang & Zhaowen Li, 2025.
"Gene Selection Algorithms in a Single-Cell Gene Decision Space Based on Self-Information,"
Mathematics, MDPI, vol. 13(11), pages 1-24, May.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:11:p:1829-:d:1668368
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:11:p:1829-:d:1668368. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.