IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i20p4315-d1261327.html
   My bibliography  Save this article

Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering

Author

Listed:
  • Soumita Seth

    (Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata 700150, West Bengal, India
    Department of Computer Science and Engineering, Aliah University, Kolkata 700160, West Bengal, India)

  • Saurav Mallik

    (Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA
    Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA)

  • Atikul Islam

    (Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, West Bengal, India)

  • Tapas Bhadra

    (Department of Computer Science and Engineering, Aliah University, Kolkata 700160, West Bengal, India)

  • Arup Roy

    (Department of Computer Science and Engineering, Budge Budge Institute of Technology, Kolkata 700137, West Bengal, India)

  • Pawan Kumar Singh

    (Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Kolkata 700106, West Bengal, India)

  • Aimin Li

    (Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China)

  • Zhongming Zhao

    (Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
    Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA)

Abstract

In this current era, the identification of both known and novel cell types, the representation of cells, predicting cell fates, classifying various tumor types, and studying heterogeneity in various cells are the key areas of interest in the analysis of single-cell RNA sequencing (scRNA-seq) data. Due to the nature of the data, cluster identification in single-cell sequencing data with high dimensions presents several difficulties. In this paper, we introduce a new framework that combines various strategies such as imputed matrix, minimum redundancy maximum relevance (MRMR) feature selection, and shrinkage clustering to discover gene signatures from scRNA-seq data. Firstly, we conducted the pre-filtering of the “drop-out” value in the data focusing solely on imputing the identified “drop-out” values. Next, we applied the MRMR feature selection method to the imputed data and obtained the top 100 features based on the MRMR feature selection optimization scores for further downstream analysis. Thereafter, we employed shrinkage clustering on the selected feature matrix to identify the cell clusters using a global optimization approach. Finally, we applied the Limma-Voom R tool employing voom normalization and an empirical Bayes test to detect differentially expressed features with a false discovery rate (FDR) < 0.001. In addition, we performed the KEGG pathway and gene ontology enrichment analysis of the identified biomarkers using David 6.8 software. Furthermore, we conducted miRNA target detection for the top gene markers and performed miRNA target gene interaction network analysis using the Cytoscape online tool. Subsequently, we compared our detected 100 markers with our previously detected top 100 cluster-specified markers ranked by FDR of the latest published article and discovered three common markers; namely, C y p 2 b 10 , M t 1 , A l p i , along with 97 novel markers. In addition, the Gene Set Enrichment Analysis (GSEA) of both marker sets also yields similar outcomes. Apart from this, we performed another comparative study with another published method, demonstrating that our model detects more significant markers than that model. To assess the efficiency of our framework, we apply it to another dataset and identify 20 strongly significant up-regulated markers. Additionally, we perform a comparative study of different imputation methods and include an ablation study to prove that every key phase of our framework is essential and strongly recommended. In summary, our proposed integrated framework efficiently discovers differentially expressed stronger gene signatures as well as up-regulated markers in single-cell RNA sequencing data.

Suggested Citation

  • Soumita Seth & Saurav Mallik & Atikul Islam & Tapas Bhadra & Arup Roy & Pawan Kumar Singh & Aimin Li & Zhongming Zhao, 2023. "Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering," Mathematics, MDPI, vol. 11(20), pages 1-26, October.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4315-:d:1261327
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/20/4315/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/20/4315/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Monika Khandelwal & Sabha Sheikh & Ranjeet Kumar Rout & Saiyed Umer & Saurav Mallik & Zhongming Zhao, 2022. "Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences," Mathematics, MDPI, vol. 10(13), pages 1-20, June.
    2. Carsten Sticht & Carolina De La Torre & Alisha Parveen & Norbert Gretz, 2018. "miRWalk: An online resource for prediction of microRNA binding sites," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-6, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Léa Maitre & Mariona Bustamante & Carles Hernández-Ferrer & Denise Thiel & Chung-Ho E. Lau & Alexandros P. Siskos & Marta Vives-Usano & Carlos Ruiz-Arenas & Dolors Pelegrí-Sisó & Oliver Robinson & Dan, 2022. "Multi-omics signatures of the human early life exposome," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    2. Junpeng Zhang & Taosheng Xu & Lin Liu & Wu Zhang & Chunwen Zhao & Sijing Li & Jiuyong Li & Nini Rao & Thuc Duy Le, 2020. "LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-22, April.
    3. Sarah M. Hücker & Tobias Fehlmann & Christian Werno & Kathrin Weidele & Florian Lüke & Anke Schlenska-Lange & Christoph A. Klein & Andreas Keller & Stefan Kirsch, 2021. "Single-cell microRNA sequencing method comparison and application to cell lines and circulating lung tumor cells," Nature Communications, Nature, vol. 12(1), pages 1-13, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4315-:d:1261327. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.