IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i6p993-d774885.html
   My bibliography  Save this article

Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works

Author

Listed:
  • Joaquim Fernando Pinto da Costa

    (CMUP, Departamento de Matemática, Faculdade de Ciências, Universidade do Porto, rua do Campo Alegre s/n, 4169-007 Porto, Portugal
    These authors contributed equally to this work.)

  • Manuel Cabral

    (Departamento de Matemática, Faculdade de Ciências, Universidade do Porto, rua do Campo Alegre s/n, 4169-007 Porto, Portugal
    These authors contributed equally to this work.)

Abstract

The importance of statistical methods in finding patterns and trends in otherwise unstructured and complex large sets of data has grown over the past decade, as the amount of data produced keeps growing exponentially and knowledge obtained from understanding data allows to make quick and informed decisions that save time and provide a competitive advantage. For this reason, we have seen considerable advances over the past few years in statistical methods in data mining. This paper is a comprehensive and systematic review of these recent developments in the area of data mining.

Suggested Citation

  • Joaquim Fernando Pinto da Costa & Manuel Cabral, 2022. "Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works," Mathematics, MDPI, vol. 10(6), pages 1-22, March.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:6:p:993-:d:774885
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/6/993/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/6/993/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Matias D. Cattaneo & Michael Jansson & Xinwei Ma, 2020. "Simple Local Polynomial Density Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1449-1455, July.
    2. Yaowu Liu & Jun Xie, 2020. "Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 393-402, January.
    3. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    4. Mudong Zeng & Yujie Liao & Runze Li & Agus Sudjianto, 2022. "Local Linear Approximation Algorithm for Neural Network," Mathematics, MDPI, vol. 10(3), pages 1-22, February.
    5. Kwon, Sunghoon & Lee, Sangin & Kim, Yongdai, 2015. "Moderately clipped LASSO," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 53-67.
    6. Andrew Gelman & Ben Goodrich & Jonah Gabry & Aki Vehtari, 2019. "R-squared for Bayesian Regression Models," The American Statistician, Taylor & Francis Journals, vol. 73(3), pages 307-309, July.
    7. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    8. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    9. Babacar Gaye & Dezheng Zhang & Aziguli Wulamu, 2021. "Improvement of Support Vector Machine Algorithm in Big Data Background," Mathematical Problems in Engineering, Hindawi, vol. 2021, pages 1-9, June.
    10. Tingyou Zhou & Liping Zhu & Chen Xu & Runze Li, 2020. "Model-Free Forward Screening Via Cumulative Divergence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1393-1405, July.
    11. Bethany Lusch & J. Nathan Kutz & Steven L. Brunton, 2018. "Deep learning for universal linear embeddings of nonlinear dynamics," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    12. Quentin F. Gronau & Alexander Ly & Eric-Jan Wagenmakers, 2020. "Informed Bayesian t-Tests," The American Statistician, Taylor & Francis Journals, vol. 74(2), pages 137-143, April.
    13. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    14. Qiang Sun & Wen-Xin Zhou & Jianqing Fan, 2020. "Adaptive Huber Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 254-265, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Adelaida Ojeda-Beltrán & Andrés Solano-Barliza & Wilson Arrubla-Hoyos & Danny Daniel Ortega & Dora Cama-Pinto & Juan Antonio Holgado-Terriza & Miguel Damas & Gilberto Toscano-Vanegas & Alejandro Cama-, 2023. "Characterisation of Youth Entrepreneurship in Medellín-Colombia Using Machine Learning," Sustainability, MDPI, vol. 15(13), pages 1-19, June.
    2. Khishigsuren Davagdorj & Ling Wang & Meijing Li & Van-Huy Pham & Keun Ho Ryu & Nipon Theera-Umpon, 2022. "Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering," IJERPH, MDPI, vol. 19(10), pages 1-21, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Han, Dongxiao & Huang, Jian & Lin, Yuanyuan & Shen, Guohao, 2022. "Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors," Journal of Econometrics, Elsevier, vol. 230(2), pages 416-431.
    2. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    3. Pan Shang & Lingchen Kong, 2021. "Regularization Parameter Selection for the Low Rank Matrix Recovery," Journal of Optimization Theory and Applications, Springer, vol. 189(3), pages 772-792, June.
    4. Haofeng Wang & Hongxia Jin & Xuejun Jiang & Jingzhi Li, 2022. "Model Selection for High Dimensional Nonparametric Additive Models via Ridge Estimation," Mathematics, MDPI, vol. 10(23), pages 1-22, December.
    5. Andini, Monica & Boldrini, Michela & Ciani, Emanuele & de Blasio, Guido & D'Ignazio, Alessio & Paladini, Andrea, 2022. "Machine learning in the service of policy targeting: The case of public credit guarantees," Journal of Economic Behavior & Organization, Elsevier, vol. 198(C), pages 434-475.
    6. Chen, Huangyue & Kong, Lingchen & Shang, Pan & Pan, Shanshan, 2020. "Safe feature screening rules for the regularized Huber regression," Applied Mathematics and Computation, Elsevier, vol. 386(C).
    7. Francesco Decarolis & Raymond Fisman & Paolo Pinotti & Silvia Vannutelli, 2019. "Rules, Discretion, and Corruption in Procurement: Evidence from Italian Government Contracting," Boston University - Department of Economics - The Institute for Economic Development Working Papers Series dp-344, Boston University - Department of Economics.
    8. Eibich, Peter & Siedler, Thomas, 2020. "Retirement, intergenerational time transfers, and fertility," European Economic Review, Elsevier, vol. 124(C).
    9. Meng An & Haixiang Zhang, 2023. "High-Dimensional Mediation Analysis for Time-to-Event Outcomes with Additive Hazards Model," Mathematics, MDPI, vol. 11(24), pages 1-11, December.
    10. Luis R. Martinez & Jonas Jessen & Guo Xu, 2023. "A Glimpse of Freedom: Allied Occupation and Political Resistance in East Germany," American Economic Journal: Applied Economics, American Economic Association, vol. 15(1), pages 68-106, January.
    11. Tomohiro Ando & Ruey S. Tsay, 2009. "Model selection for generalized linear models with factor‐augmented predictors," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 25(3), pages 207-235, May.
    12. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    14. Leandro Andrián & Oscar Mauricio Valencia, 2023. "Past the Tipping Point? Assessing Debt Overhang in Latin America and the Caribbean," IDB Publications (Book Chapters), in: Andrew Powell & Oscar Mauricio Valencia (ed.), Dealing with Debt, edition 1, chapter 8, pages 183-196, Inter-American Development Bank.
    15. Annika Lindskog & Dick Durevall, 2021. "To educate a woman and to educate a man: Gender‐specific sexual behavior and human immunodeficiency virus responses to an education reform in Botswana," Health Economics, John Wiley & Sons, Ltd., vol. 30(3), pages 642-658, March.
    16. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    17. Albanese, Andrea & Picchio, Matteo & Ghirelli, Corinna, 2020. "Timed to Say Goodbye: Does Unemployment Benefit Eligibility Affect Worker Layoffs?," Labour Economics, Elsevier, vol. 65(C).
    18. Sauvenier, Mathieu & Van Bellegem, Sébastien, 2023. "Direction Identification and Minimax Estimation by Generalized Eigenvalue Problem in High Dimensional Sparse Regression," LIDAM Discussion Papers CORE 2023005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    19. Ethan Bahl & Snehajyoti Chatterjee & Utsav Mukherjee & Muhammad Elsadany & Yann Vanrobaeys & Li-Chun Lin & Miriam McDonough & Jon Resch & K. Peter Giese & Ted Abel & Jacob J. Michaelson, 2024. "Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    20. Canaan, Serena & Mouganie, Pierre & Zhang, Peng, 2022. "The Long-Run Educational Benefits of High-Achieving Classrooms," IZA Discussion Papers 15039, Institute of Labor Economics (IZA).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:6:p:993-:d:774885. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.