IDEAS home Printed from https://ideas.repec.org/a/aac/ijirss/v8y2025i3p2774-2786id7083.html
   My bibliography  Save this article

Implementation of RNN-LSTM with L1 regularization for predicting labels from chimpanzee DNA sequences using pseudo-labeling

Author

Listed:
  • Sugiyarto Surono
  • Goh Khang Wen
  • Arif Rahman
  • Lalu M. Irham
  • Sintia Afriyani

Abstract

Chimpanzee genome research plays a crucial role in understanding evolution, health, and biological functions. However, incomplete labeling of DNA sequence data presents a challenge for accurate genomic classification. This study aims to improve chimpanzee DNA sequence classification by addressing label scarcity and data imbalance through a deep learning approach. A Recurrent Neural Network Long Short-Term Memory (RNN-LSTM) model with L1 Regularization and pseudo-labeling is employed to enhance classification performance. The workflow includes numerical encoding of DNA sequences, pseudo-labeling to augment training data, and model training using Stochastic Gradient Descent (SGD) optimization. Performance evaluation is conducted using classification accuracy and AUC metrics. Results show that the proposed approach achieves high classification accuracy, with an AUC ranging from 0.94 to 0.99, significantly improving the handling of imbalanced datasets. The integration of pseudo-labeling effectively leverages unlabeled DNA sequences, leading to a more robust genomic classification model. These findings highlight the potential of combining RNN-LSTM with L1 Regularization and pseudo-labeling to address incomplete labeling in genomic datasets. The study advances genomic classification techniques and supports Goal 3: Good Health and Well-being of the Sustainable Development Goals (SDGs) by enhancing DNA sequence classification accuracy, facilitating early disease detection, precision medicine, and evolutionary studies.

Suggested Citation

  • Sugiyarto Surono & Goh Khang Wen & Arif Rahman & Lalu M. Irham & Sintia Afriyani, 2025. "Implementation of RNN-LSTM with L1 regularization for predicting labels from chimpanzee DNA sequences using pseudo-labeling," International Journal of Innovative Research and Scientific Studies, Innovative Research Publishing, vol. 8(3), pages 2774-2786.
  • Handle: RePEc:aac:ijirss:v:8:y:2025:i:3:p:2774-2786:id:7083
    as

    Download full text from publisher

    File URL: https://ijirss.com/index.php/ijirss/article/view/7083/1467
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    Chimpanzee genome analysis; Goal 3; Good health and well-being (SDGs); L1 regularization feature selection; Pseudo-labeling in genomics; RNN-LSTM for DNA sequence classification.;
    All these keywords.

    JEL classification:

    • L1 - Industrial Organization - - Market Structure, Firm Strategy, and Market Performance

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aac:ijirss:v:8:y:2025:i:3:p:2774-2786:id:7083. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Natalie Jean (email available below). General contact details of provider: https://ijirss.com/index.php/ijirss/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.