IDEAS home Printed from https://ideas.repec.org/h/wsi/wschap/9789819813049_0009.html
   My bibliography  Save this book chapter

Performance-Driven Dimensionality Reduction: A Data-Centric Approach to Feature Engineering in Machine Learning

In: Transactions of ADIA Lab Interdisciplinary Advances in Data and Computational Science

Author

Listed:
  • Joshua Chung
  • Marcos Lopez de Prado
  • Horst D. Simon
  • Kesheng Wu

Abstract

In a number of applications, data may be anonymized, obfuscated, and highly noisy. In such cases, it is difficult to use domain knowledge and low-dimensional visualizations to engineer the features for tasks such as machine learning. In this work, we explore a variety of dimensionality reduction (DR) techniques in the form of feature extraction and feature selection to decrease multicollinearity and improve the predictive power of our modeling tasks. These techniques include principal component analysis (PCA), locally linear embedding (LLE), Isomap, Kernel principal component analysis (KPCA), uniform manifold approximation and projection (UMAP), mean decrease accuracy, Shapley Values, and feature clustering. Due to the data-driven nature of our methodology, all forms of DR algorithm selection, hyperparameter tuning, and model tuning are done purely based on performance on our models, rather than a priori knowledge. This method will show which technique will increase the predictive power of our random forest model. Due to the generality of our method, this approach offers flexibility for regression or classification with any machine learning model and any unsupervised DR technique.

Suggested Citation

  • Joshua Chung & Marcos Lopez de Prado & Horst D. Simon & Kesheng Wu, 2025. "Performance-Driven Dimensionality Reduction: A Data-Centric Approach to Feature Engineering in Machine Learning," World Scientific Book Chapters, in: Horst Simon (ed.), Transactions of ADIA Lab Interdisciplinary Advances in Data and Computational Science, chapter 9, pages 245-272, World Scientific Publishing Co. Pte. Ltd..
  • Handle: RePEc:wsi:wschap:9789819813049_0009
    as

    Download full text from publisher

    File URL: https://www.worldscientific.com/doi/pdf/10.1142/9789819813049_0009
    Download Restriction: Ebook Access is available upon purchase.

    File URL: https://www.worldscientific.com/doi/abs/10.1142/9789819813049_0009
    Download Restriction: Ebook Access is available upon purchase.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Keywords

    Computational Science; Data Science; AI Applications; Climate Science; Medical Imaging; Sustainability; Interdisciplinary Research; Data Science; Mathematical and Quantitative Finance;
    All these keywords.

    JEL classification:

    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques
    • G11 - Financial Economics - - General Financial Markets - - - Portfolio Choice; Investment Decisions
    • Q54 - Agricultural and Natural Resource Economics; Environmental and Ecological Economics - - Environmental Economics - - - Climate; Natural Disasters and their Management; Global Warming
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:wschap:9789819813049_0009. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscientific.com/page/worldscibooks .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.