IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-03222439.html
   My bibliography  Save this paper

A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings

Author

Listed:
  • Pierre Michel

    (AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique)

  • Nicolas Ngo

    (SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale)

  • Jean-François Pons

    (WitMonki)

  • Stéphane Delliaux

    (C2VN - Centre recherche en CardioVasculaire et Nutrition = Center for CardioVascular and Nutrition research - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement)

  • Roch Giorgi

    (SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale)

Abstract

Background: In high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability. Thus, in the field of clinical decision making, only the most relevant features from a set of medical descriptors should be considered when determining whether a patient is healthy or not. This statistical approach known as feature selection can be performed through regression or classification, in a supervised or unsupervised manner. Several feature selection approaches using different mathematical concepts have been described in the literature. In the field of classification, a new approach has recently been proposed that uses the γ-metric, an index measuring separability between different classes in heart rhythm characterization. The present study proposes a filter approach for feature selection in classification using this γ-metric, and evaluates its application to automatic atrial fibrillation detection. Methods: The stability and prediction performance of the γ-metric feature selection approach was evaluated using the support vector machine model on two heart rhythm datasets, one extracted from the PhysioNet database and the other from the database of Marseille University Hospital Center, France (Timone Hospital). Both datasets contained electrocardiogram recordings grouped into two classes: normal sinus rhythm and atrial fibrillation. The performance of this feature selection approach was compared to that of three other approaches, with the first two based on the Random Forest technique and the other on receiver operating characteristic curve analysis. Results: The γ-metric approach showed satisfactory results, especially for models with a smaller number of features. For the training dataset, all prediction indicators were higher for our approach (accuracy greater than 99% for models with 5 to 17 features), as was stability (greater than 0.925 regardless of the number of features included in the model). For the validation dataset, the features selected with the y-metric approach differed from those selected with the other approaches; sensitivity was higher for our approach, but other indicators were similar. Conclusion: This filter approach for feature selection in classification opens up new methodological avenues for atrial fibrillation detection using short electrocardiogram recordings.

Suggested Citation

  • Pierre Michel & Nicolas Ngo & Jean-François Pons & Stéphane Delliaux & Roch Giorgi, 2021. "A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings," Post-Print hal-03222439, HAL.
  • Handle: RePEc:hal:journl:hal-03222439
    DOI: 10.1186/s12911-021-01427-8
    Note: View the original document on HAL open archive server: https://amu.hal.science/hal-03222439
    as

    Download full text from publisher

    File URL: https://amu.hal.science/hal-03222439/document
    Download Restriction: no

    File URL: https://libkey.io/10.1186/s12911-021-01427-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dernoncourt, David & Hanczar, Blaise & Zucker, Jean-Daniel, 2014. "Analysis of feature selection stability on high dimension and small sample data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 681-693.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yu, Lean & Zhang, Xiaoming, 2021. "Can small sample dataset be used for efficient internet loan credit risk assessment? Evidence from online peer to peer lending," Finance Research Letters, Elsevier, vol. 38(C).
    2. He, Yan-Lin & Wang, Ping-Jiang & Zhang, Ming-Qing & Zhu, Qun-Xiong & Xu, Yuan, 2018. "A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: A case study of Ethylene industry," Energy, Elsevier, vol. 147(C), pages 418-427.
    3. David Juárez-Varón & Victoria Tur-Viñes & Alejandro Rabasa-Dolado & Kristina Polotskaya, 2020. "An Adaptive Machine Learning Methodology Applied to Neuromarketing Analysis: Prediction of Consumer Behaviour Regarding the Key Elements of the Packaging Design of an Educational Toy," Social Sciences, MDPI, vol. 9(9), pages 1-23, September.
    4. Kristof Lommers & Ouns El Harzli & Jack Kim, 2021. "Confronting Machine Learning With Financial Research," Papers 2103.00366, arXiv.org, revised Mar 2021.
    5. Xianlong Zhang & Fei Zhang & Hsiang-te Kung & Ping Shi & Ayinuer Yushanjiang & Shidan Zhu, 2018. "Estimation of the Fe and Cu Contents of the Surface Water in the Ebinur Lake Basin Based on LIBS and a Machine Learning Algorithm," IJERPH, MDPI, vol. 15(11), pages 1-20, October.
    6. Abpeykar, Shadi & Ghatee, Mehdi & Zare, Hadi, 2019. "Ensemble decision forest of RBF networks via hybrid feature clustering approach for high-dimensional data classification," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 12-36.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-03222439. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.