IDEAS home Printed from https://ideas.repec.org/a/spr/joheur/v31y2025i1d10.1007_s10732-025-09550-9.html
   My bibliography  Save this article

Feature selection for high-dimensional data using a multivariate search space reduction strategy based scatter search

Author

Listed:
  • Miguel Garcia-Torres

    (Universidad Pablo de Olavide)

Abstract

In feature selection, the increasing of the dimensionality and the complexity of feature interactions make the problem challenging. Furthermore, searching for an optimal subset of features from a high-dimensional feature space is known to be an $$\mathcal{N}\mathcal{P}$$ N P -hard problem. To improve the efficiency and effectiveness of the search algorithm, feature grouping has emerged as a way to reduce the search space by clustering features according to a measure. In this work we propose to reduce the search space by applying a greedy algorithm, called Multivariate Greedy Predominant Groups Generator (MGPGG). MGPGG extends the idea of the Greedy Predominant Groups Generator (GPGG) algorithm by taking into account feature interaction among three or more features. For this purpose, MGPGG uses the Multivariate Symmetrical Uncertainty (MSU) to group features that share information about the class label. We also propose a Scatter Search strategy that integrates MGPGG to find small subsets of features with high predictive power. The proposed algorithm, called Multivariate Predominant Group-based Scatter Search (MPGSS), is tested on high-dimensional data from biomedical and text-mining fields. The proposal is compared with state-of-the-art feature selection strategies. Results show that MPGSS is competitive since it is capable of finding small subsets of features while keeping high predictive classification models.

Suggested Citation

  • Miguel Garcia-Torres, 2025. "Feature selection for high-dimensional data using a multivariate search space reduction strategy based scatter search," Journal of Heuristics, Springer, vol. 31(1), pages 1-33, March.
  • Handle: RePEc:spr:joheur:v:31:y:2025:i:1:d:10.1007_s10732-025-09550-9
    DOI: 10.1007/s10732-025-09550-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10732-025-09550-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10732-025-09550-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dettling, Marcel & Bühlmann, Peter, 2004. "Finding predictive gene groups from microarray data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 106-131, July.
    2. Garci'a Lopez, Felix & Garci'a Torres, Miguel & Melian Batista, Belen & Moreno Perez, Jose A. & Moreno-Vega, J. Marcos, 2006. "Solving feature subset selection problem by a Parallel Scatter Search," European Journal of Operational Research, Elsevier, vol. 169(2), pages 477-489, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    2. Muhammad Arif & Ahmed Kattan, 2015. "Physical Activities Monitoring Using Wearable Acceleration Sensors Attached to the Body," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
    3. Zambom, Adriano Zanin & Akritas, Michael G., 2015. "Nonparametric significance testing and group variable selection," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 51-60.
    4. Jessie J Hsu & Dianne M Finkelstein & David A Schoenfeld, 2015. "Outcome-Driven Cluster Analysis with Application to Microarray Data," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-15, November.
    5. Panagopoulos, Orestis P. & Pappu, Vijay & Xanthopoulos, Petros & Pardalos, Panos M., 2016. "Constrained subspace classifier for high dimensional datasets," Omega, Elsevier, vol. 59(PA), pages 40-46.
    6. Kerkhove, L.-P. & Vanhoucke, M., 2017. "A parallel multi-objective scatter search for optimising incentive contract design in projects," European Journal of Operational Research, Elsevier, vol. 261(3), pages 1066-1084.
    7. William S Sanders & C Ian Johnston & Susan M Bridges & Shane C Burgess & Kenneth O Willeford, 2011. "Prediction of Cell Penetrating Peptides by Support Vector Machines," PLOS Computational Biology, Public Library of Science, vol. 7(7), pages 1-12, July.
    8. Cui, Qiurong & Xu, Yuqing & Zhang, Zhengjun & Chan, Vincent, 2021. "Max-linear regression models with regularization," Journal of Econometrics, Elsevier, vol. 222(1), pages 579-600.
    9. Garcia-Magariños Manuel & Antoniadis Anestis & Cao Ricardo & González-Manteiga Wenceslao, 2010. "Lasso Logistic Regression, GSoft and the Cyclic Coordinate Descent Algorithm: Application to Gene Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-30, August.
    10. Howard D. Bondell & Brian J. Reich, 2008. "Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR," Biometrics, The International Biometric Society, vol. 64(1), pages 115-123, March.
    11. M. Marques Alves & Jonathan Eckstein & Marina Geremia & Jefferson G. Melo, 2020. "Relative-error inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms," Computational Optimization and Applications, Springer, vol. 75(2), pages 389-422, March.
    12. Özge Sürer & Daniel W. Apley & Edward C. Malthouse, 2024. "Discovering interpretable structure in longitudinal predictors via coefficient trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(4), pages 911-951, December.
    13. Jonathan Eckstein & Wang Yao, 2017. "Approximate ADMM algorithms derived from Lagrangian splitting," Computational Optimization and Applications, Springer, vol. 68(2), pages 363-405, November.
    14. Yao, Xingzhi & Izzeldin, Marwan & Li, Zhenxiong, 2019. "A novel cluster HAR-type model for forecasting realized volatility," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1318-1331.
    15. Pablo Venegas & Francisco Calderon & Daniel Riofrío & Diego Benítez & Giovani Ramón & Diego Cisneros-Heredia & Miguel Coimbra & José Luis Rojo-Álvarez & Noel Pérez, 2021. "Automatic ladybird beetle detection using deep-learning models," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-21, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:joheur:v:31:y:2025:i:1:d:10.1007_s10732-025-09550-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.