IDEAS home Printed from https://ideas.repec.org/a/hin/complx/6278908.html
   My bibliography  Save this article

Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm

Author

Listed:
  • Noemí DeCastro-García
  • Ángel Luis Muñoz Castañeda
  • David Escudero García
  • Miguel V. Carriegos

Abstract

Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.

Suggested Citation

  • Noemí DeCastro-García & Ángel Luis Muñoz Castañeda & David Escudero García & Miguel V. Carriegos, 2019. "Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm," Complexity, Hindawi, vol. 2019, pages 1-16, February.
  • Handle: RePEc:hin:complx:6278908
    DOI: 10.1155/2019/6278908
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/8503/2019/6278908.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/8503/2019/6278908.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2019/6278908?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zhun Cheng & Zhixiong Lu, 2018. "A Novel Efficient Feature Dimensionality Reduction Method and Its Application in Engineering," Complexity, Hindawi, vol. 2018, pages 1-14, October.
    2. Massimiliano Zanin & Miguel Romance & Santiago Moral & Regino Criado, 2018. "Credit Card Fraud Detection through Parenclitic Network Analysis," Complexity, Hindawi, vol. 2018, pages 1-9, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ángel Luis Muñoz Castañeda & Noemí DeCastro-García & David Escudero García, 2021. "RHOASo: An Early Stop Hyper-Parameter Optimization Algorithm," Mathematics, MDPI, vol. 9(18), pages 1-52, September.
    2. Yuchen Wang & Zhengshan Luo & Jihao Luo & Yiqiong Gao & Yulei Kong & Qingqing Wang, 2023. "Investigation of the Solubility of Elemental Sulfur (S) in Sulfur-Containing Natural Gas with Machine Learning Methods," IJERPH, MDPI, vol. 20(6), pages 1-21, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Catayoun Azarm & Erman Acar & Mickey van Zeelt, 2024. "On the Potential of Network-Based Features for Fraud Detection," Papers 2402.09495, arXiv.org, revised Feb 2024.
    2. Iglesias Pérez, Sergio & Moral-Rubio, Santiago & Criado, Regino, 2021. "A new approach to combine multiplex networks and time series attributes: Building intrusion detection systems (IDS) in cybersecurity," Chaos, Solitons & Fractals, Elsevier, vol. 150(C).
    3. Zhun Cheng & Zhixiong Lu, 2022. "Regression-Based Correction and I-PSO-Based Optimization of HMCVT’s Speed Regulating Characteristics for Agricultural Machinery," Agriculture, MDPI, vol. 12(5), pages 1-18, April.
    4. Sergio Iglesias Perez & Regino Criado, 2022. "Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs," Mathematics, MDPI, vol. 11(1), pages 1-24, December.
    5. Zhun Cheng & Huadong Zhou & Zhixiong Lu, 2022. "A Novel 10-Parameter Motor Efficiency Model Based on I-SA and Its Comparative Application of Energy Utilization Efficiency in Different Driving Modes for Electric Tractor," Agriculture, MDPI, vol. 12(3), pages 1-20, March.
    6. Cheng, Zhun, 2023. "High nonlinearity of BEV's stepped automatic transmission design objectives and its optimal solution by a novel ISA-RSA," Energy, Elsevier, vol. 282(C).
    7. Bofei Xiao & Bo Lei & Wei Lan & Bin Guo, 2022. "A blockwise network autoregressive model with application for fraud detection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(6), pages 1043-1065, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:complx:6278908. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.