IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v17y2025i11p495-d1782043.html

Sparse Regularized Autoencoders-Based Radiomics Data Augmentation for Improved EGFR Mutation Prediction in NSCLC

Author

Listed:
  • Muhammad Asif Munir

    (Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
    Department of Electrical Engineering, Swedish College of Engineering and Technology, Shahbazpur Road, Rahim Yar Khan 64200, Pakistan)

  • Reehan Ali Shah

    (Department of Computer Science, Shaheed Benazir Bhutto University, SBA (SBBU-SBA), Nawabshah 67450, Pakistan
    Department of Computer Systems Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan)

  • Urooj Waheed

    (Department of Computer Science, DHA Suffa University, Karachi 75500, Pakistan)

  • Muhammad Aqeel Aslam

    (Department of Electrical Engineering, GIFT University, Gujranwala 52250, Pakistan)

  • Zeeshan Rashid

    (Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan)

  • Mohammed Aman

    (Department of Industrial Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia)

  • Muhammad I. Masud

    (Department of Electrical Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia)

  • Zeeshan Ahmad Arfeen

    (Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan)

Abstract

Lung cancer (LC) remains a leading cause of cancer mortality worldwide, where accurate and early identification of gene mutations such as epidermal growth factor receptor (EGFR) is critical for precision treatment. However, machine learning-based radiomics approaches often face challenges due to the small and imbalanced nature of the datasets. This study proposes a comprehensive framework based on Generic Sparse Regularized Autoencoders with Kullback–Leibler divergence (GSRA-KL) to generate high-quality synthetic radiomics data and overcome these limitations. A systematic approach generated 63 synthetic radiomics datasets by tuning a novel kl_weight regularization hyperparameter across three hidden-layer sizes, optimized using Optuna for computational efficiency. A rigorous assessment was conducted to evaluate the impact of hyperparameter tuning across 63 synthetic datasets, with a focus on the EGFR gene mutation. This evaluation utilized resemblance-dimension scores (RDS), novel utility-dimension scores (UDS), and t-SNE visualizations to ensure the validation of data quality, revealing that GSRA-KL achieves excellent performance (RDS > 0.45, UDS > 0.7), especially when class distribution is balanced, while remaining competitive with the Tabular Variational Autoencoder (TVAE). Additionally, a comprehensive statistical correlation analysis demonstrated strong and significant monotonic relationships among resemblance-based performance metrics up to moderate scaling (≤1.0*), confirming the robustness and stability of inter-metric associations under varying configurations. Complementary computational cost evaluation further indicated that moderate kl_weight values yield an optimal balance between reconstruction accuracy and resource utilization, with Spearman correlations revealing improved reconstruction quality (MSE ρ = − 0.78 , p < 0.001 ) at reduced computational overhead. The ablation-style analysis confirmed that including the KL divergence term meaningfully enhances the generative capacity of GSRA-KL over its baseline counterpart. Furthermore, the GSRA-KL framework achieved substantial improvements in computational efficiency compared to prior PSO-based optimization methods, resulting in reduced memory usage and training time. Overall, GSRA-KL represents an incremental yet practical advancement for augmenting small and imbalanced high-dimensional radiomics datasets, showing promise for improved mutation prediction and downstream precision oncology studies.

Suggested Citation

  • Muhammad Asif Munir & Reehan Ali Shah & Urooj Waheed & Muhammad Aqeel Aslam & Zeeshan Rashid & Mohammed Aman & Muhammad I. Masud & Zeeshan Ahmad Arfeen, 2025. "Sparse Regularized Autoencoders-Based Radiomics Data Augmentation for Improved EGFR Mutation Prediction in NSCLC," Future Internet, MDPI, vol. 17(11), pages 1-23, October.
  • Handle: RePEc:gam:jftint:v:17:y:2025:i:11:p:495-:d:1782043
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/17/11/495/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/17/11/495/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:11:p:495-:d:1782043. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.