IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v16y2022i4d10.1007_s11634-021-00462-7.html
   My bibliography  Save this article

SUBiNN: a stacked uni- and bivariate kNN sparse ensemble

Author

Listed:
  • Tiffany Elsten

    (Leiden University)

  • Mark Rooij

    (Leiden University)

Abstract

Nearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.

Suggested Citation

  • Tiffany Elsten & Mark Rooij, 2022. "SUBiNN: a stacked uni- and bivariate kNN sparse ensemble," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 847-874, December.
  • Handle: RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00462-7
    DOI: 10.1007/s11634-021-00462-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00462-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00462-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    2. Zardad Khan & Asma Gul & Aris Perperoglou & Miftahuddin Miftahuddin & Osama Mahmoud & Werner Adler & Berthold Lausen, 2020. "Ensemble of optimal trees, random forest and random projection ensemble classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 97-116, March.
    3. Robert Tibshirani, 2011. "Regression shrinkage and selection via the lasso: a retrospective," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 273-282, June.
    4. Trendafilov, Nickolay T. & Jolliffe, Ian T., 2007. "DALASS: Variable selection in discriminant analysis via the LASSO," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 3718-3736, May.
    5. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    6. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    7. Aniek Sies & Iven Mechelen, 2020. "C443: a Methodology to See a Forest for the Trees," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 730-753, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    2. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    3. Hsu, David, 2015. "Identifying key variables and interactions in statistical models of building energy consumption using regularization," Energy, Elsevier, vol. 83(C), pages 144-155.
    4. Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
    5. van Erp, Sara & Oberski, Daniel L. & Mulder, Joris, 2018. "Shrinkage priors for Bayesian penalized regression," OSF Preprints cg8fq, Center for Open Science.
    6. Felix Pretis & Lea Schneider & Jason E. Smerdon & David F. Hendry, 2016. "Detecting Volcanic Eruptions In Temperature Reconstructions By Designed Break-Indicator Saturation," Journal of Economic Surveys, Wiley Blackwell, vol. 30(3), pages 403-429, July.
    7. Jenny W Sun & Jessica M Franklin & Kathryn Rough & Rishi J Desai & Sonia Hernández-Díaz & Krista F Huybrechts & Brian T Bateman, 2020. "Predicting overdose among individuals prescribed opioids using routinely collected healthcare utilization data," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-17, October.
    8. Laura Freijeiro‐González & Manuel Febrero‐Bande & Wenceslao González‐Manteiga, 2022. "A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates," International Statistical Review, International Statistical Institute, vol. 90(1), pages 118-145, April.
    9. Sierra A. Bainter & Thomas G. McCauley & Mahmoud M. Fahmy & Zachary T. Goodman & Lauren B. Kupis & J. Sunil Rao, 2023. "Comparing Bayesian Variable Selection to Lasso Approaches for Applications in Psychology," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 1032-1055, September.
    10. Schroeders, Ulrich & Watrin, Luc & Wilhelm, Oliver, 2021. "Age-related nuances in knowledge assessment," Intelligence, Elsevier, vol. 85(C).
    11. Joanna Bruzda, 2020. "The wavelet scaling approach to forecasting: Verification on a large set of Noisy data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(3), pages 353-367, April.
    12. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    13. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    14. Lucian Belascu & Alexandra Horobet & Georgiana Vrinceanu & Consuela Popescu, 2021. "Performance Dissimilarities in European Union Manufacturing: The Effect of Ownership and Technological Intensity," Sustainability, MDPI, vol. 13(18), pages 1-19, September.
    15. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    16. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    17. Immanuel Bayer & Philip Groth & Sebastian Schneckener, 2013. "Prediction Errors in Learning Drug Response from Gene Expression Data – Influence of Labeling, Sample Size, and Machine Learning Algorithm," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-13, July.
    18. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    19. Gustavo A. Alonso-Silverio & Víctor Francisco-García & Iris P. Guzmán-Guzmán & Elías Ventura-Molina & Antonio Alarcón-Paredes, 2021. "Toward Non-Invasive Estimation of Blood Glucose Concentration: A Comparative Performance," Mathematics, MDPI, vol. 9(20), pages 1-13, October.
    20. Christopher Kath & Florian Ziel, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Papers 1811.08604, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00462-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.