IDEAS home Printed from https://ideas.repec.org/a/wsi/jikmxx/v19y2020i01ns0219649220400195.html
   My bibliography  Save this article

Practical Challenges and Recommendations of Filter Methods for Feature Selection

Author

Listed:
  • Mohammed Rajab

    (Department of Computer Science, The University of Sheffield, Sheffield, UK)

  • Dennis Wang

    (Department of Computer Science, The University of Sheffield, Sheffield, UK2Sheffield Institute for Translational Neuroscience, Sheffield, UK3NIHR Sheffield Biomedical Research Centre, Sheffield, UK)

Abstract

Feature selection, the process of identifying relevant features to be incorporated into a proposed model, is one of the significant steps of the learning process. It removes noise from the data to increase the learning performance while reducing the computational complexity. The literature review indicated that most previous studies had focused on improving the overall classifier performance or reducing costs associated with training time during building of the classifiers. However, in this era of big data, there is an urgent need to deal with more complex issues that makes feature selection, especially using filter-based methods, more challenging; this in terms of dimensionality, data structures, data format, domain experts’ availability, data sparsity, and result discrepancies, among others. Filter methods identify the informative features of a given dataset to establish various predictive models using mathematical models. This paper takes a new route in an attempt to pinpoint recent practical challenges associated with filter methods and discusses potential areas of development to yield better performance. Several practical recommendations, based on recent studies, are made to overcome the identified challenges and make the feature selection process simpler and more efficient.

Suggested Citation

  • Mohammed Rajab & Dennis Wang, 2020. "Practical Challenges and Recommendations of Filter Methods for Feature Selection," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-15, March.
  • Handle: RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400195
    DOI: 10.1142/S0219649220400195
    as

    Download full text from publisher

    File URL: https://www.worldscientific.com/doi/abs/10.1142/S0219649220400195
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219649220400195?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Firuz Kamalov & Fadi Thabtah, 2017. "A Feature Selection Method Based on Ranked Vector Scores of Features for Classification," Annals of Data Science, Springer, vol. 4(4), pages 483-502, December.
    2. Paul Town & Fadi Thabtah, 2019. "Data Analytics Tools: A User Perspective," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 18(01), pages 1-16, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Firuz Kamalov & Ho Hon Leung, 2020. "Outlier Detection in High Dimensional Data," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-16, March.
    2. Firuz Kamalov & Fadi Thabtah & Ho Hon Leung, 2023. "Feature Selection in Imbalanced Data," Annals of Data Science, Springer, vol. 10(6), pages 1527-1541, December.
    3. Anthony Gramaje & Fadi Thabtah & Neda Abdelhamid & Sayan Kumar Ray, 2021. "Patient Discharge Classification Using Machine Learning Techniques," Annals of Data Science, Springer, vol. 8(4), pages 755-767, December.
    4. Firuz Kamalov & Ho Hon Leung & Sherif Moussa, 2022. "Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection," Annals of Data Science, Springer, vol. 9(6), pages 1223-1241, December.
    5. Majed Rajab, 2019. "Visualisation Model Based on Phishing Features," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 18(01), pages 1-17, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400195. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.