IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v6y2019i4d10.1007_s40745-019-00217-4.html
   My bibliography  Save this article

Improving Time Complexity and Accuracy of the Machine Learning Algorithms Through Selection of Highly Weighted Top k Features from Complex Datasets

Author

Listed:
  • Abdul Majeed

    (Korea Aerospace University)

Abstract

Machine learning algorithms (MLAs) usually process large and complex datasets containing a substantial number of features to extract meaningful information about the target concept (a.k.a class). In most cases, MLAs suffer from the latency and computational complexity issues while processing such complex datasets due to the presence of lesser weight (i.e., irrelevant or redundant) features. The computing time of the MLAs increases explosively with increase in the number of features, feature dependence, number of records, types of the features, and nested features categories present in such datasets. Appropriate feature selection before applying MLA is a handy solution to effectively resolve the computing speed and accuracy trade-off while processing large and complex datasets. However, selection of the features that are sufficient, necessary, and are highly co-related with the target concept is very challenging. This paper presents an efficient feature selection algorithm based on random forest to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets. The proposed feature selection algorithm yields unique features that are closely related with the target concept (i.e., class). The proposed algorithm significantly reduces the computing time of the MLAs without degrading the accuracy much while learning the target concept from the large and complex datasets. The simulation results fortify the efficacy and effectiveness of the proposed algorithm.

Suggested Citation

  • Abdul Majeed, 2019. "Improving Time Complexity and Accuracy of the Machine Learning Algorithms Through Selection of Highly Weighted Top k Features from Complex Datasets," Annals of Data Science, Springer, vol. 6(4), pages 599-621, December.
  • Handle: RePEc:spr:aodasc:v:6:y:2019:i:4:d:10.1007_s40745-019-00217-4
    DOI: 10.1007/s40745-019-00217-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-019-00217-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-019-00217-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mohammed Amin Belarbi & Saïd Mahmoudi & Ghalem Belalem, 2017. "PCA as Dimensionality Reduction for Large-Scale Image Retrieval Systems," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 8(4), pages 45-58, October.
    2. Alfred Maussner, 2005. "Projection Methods (GAUSS)," QM&RBC Codes 135, Quantitative Macroeconomics & Real Business Cycles.
    3. Bogumił Kamiński & Michał Jakubczyk & Przemysław Szufel, 2018. "A framework for sensitivity analysis of decision trees," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 26(1), pages 135-159, March.
    4. Meiri, Ronen & Zahavi, Jacob, 2006. "Using simulated annealing to optimize the feature selection problem in marketing applications," European Journal of Operational Research, Elsevier, vol. 171(3), pages 842-858, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nikhil J. Rathod & Manoj K. Chopra & Prem Kumar Chaurasiya & Umesh S. Vidhate & Abhishek Dasore, 2023. "Optimization on the Turning Process Parameters of SS 304 Using Taguchi and TOPSIS," Annals of Data Science, Springer, vol. 10(5), pages 1405-1419, October.
    2. Prashant Singh & Prashant Verma & Nikhil Singh, 2022. "Offline Signature Verification: An Application of GLCM Features in Machine Learning," Annals of Data Science, Springer, vol. 9(6), pages 1309-1321, December.
    3. Manoj Verma & Harish Kumar Ghritlahre & Surendra Bajpai, 2023. "A Case Study of Optimization of a Solar Power Plant Sizing and Placement in Madhya Pradesh, India Using Multi-Objective Genetic Algorithm," Annals of Data Science, Springer, vol. 10(4), pages 933-966, August.
    4. Firuz Kamalov & Fadi Thabtah & Ho Hon Leung, 2023. "Feature Selection in Imbalanced Data," Annals of Data Science, Springer, vol. 10(6), pages 1527-1541, December.
    5. Mohamed Ibrahim & Khaoula Aidi & M. Masoom Ali & Haitham M. Yousof, 2023. "A Novel Test Statistic for Right Censored Validity under a new Chen extension with Applications in Reliability and Medicine," Annals of Data Science, Springer, vol. 10(5), pages 1285-1299, October.
    6. Vojo Lakovic, 2020. "Modeling of Entrepreneurship Activity Crisis Management by Support Vector Machine," Annals of Data Science, Springer, vol. 7(4), pages 629-638, December.
    7. Anurag Barthwal & Kritika Sharma, 2022. "Analysis and prediction of urban ambient and surface temperatures using internet of things," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(1), pages 516-532, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    2. Deac Dan Stelian & Schebesch Klaus Bruno, 2018. "Market Forecasts and Client Behavioral Data: Towards Finding Adequate Model Complexity," Studia Universitatis „Vasile Goldis” Arad – Economics Series, Sciendo, vol. 28(3), pages 50-75, September.
    3. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    4. Pacheco, Joaquín & Casado, Silvia & Núñez, Laura, 2009. "A variable selection method based on Tabu search for logistic regression models," European Journal of Operational Research, Elsevier, vol. 199(2), pages 506-511, December.
    5. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    6. Ivan Miguel Pires & Faisal Hussain & Nuno M. Garcia & Petre Lameski & Eftim Zdravevski, 2020. "Homogeneous Data Normalization and Deep Learning: A Case Study in Human Activity Classification," Future Internet, MDPI, vol. 12(11), pages 1-14, November.
    7. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    8. Wang, Xin & Liu, Xiaodong & Pedrycz, Witold & Zhu, Xiaolei & Hu, Guangfei, 2012. "Mining axiomatic fuzzy set association rules for classification problems," European Journal of Operational Research, Elsevier, vol. 218(1), pages 202-210.
    9. Michelle Sapitang & Wanie M. Ridwan & Khairul Faizal Kushiar & Ali Najah Ahmed & Ahmed El-Shafie, 2020. "Machine Learning Application in Reservoir Water Level Forecasting for Sustainable Hydropower Generation Strategy," Sustainability, MDPI, vol. 12(15), pages 1-19, July.
    10. Pinciroli, Luca & Baraldi, Piero & Zio, Enrico, 2023. "Maintenance optimization in industry 4.0," Reliability Engineering and System Safety, Elsevier, vol. 234(C).
    11. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    12. Ding‐Wen Tan & William Yeoh & Yee Ling Boo & Soung‐Yue Liew, 2013. "The Impact Of Feature Selection: A Data‐Mining Application In Direct Marketing," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 20(1), pages 23-38, January.
    13. Pin Wang & Yongming Li & Bohan Chen & Xianling Hu & Jin Yan & Yu Xia & Jie Yang, 2017. "Proportional Hybrid Mechanism for Population Based Feature Selection Algorithm," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(05), pages 1309-1338, September.
    14. Fabrizio De Caro & Amedeo Andreotti & Rodolfo Araneo & Massimo Panella & Antonello Rosato & Alfredo Vaccaro & Domenico Villacci, 2020. "A Review of the Enabling Methodologies for Knowledge Discovery from Smart Grids Data," Energies, MDPI, vol. 13(24), pages 1-25, December.
    15. García-Alonso, Carlos R. & Torres-Jiménez, Mercedes & Hervás-Martínez, César, 2010. "Income prediction in the agrarian sector using product unit neural networks," European Journal of Operational Research, Elsevier, vol. 204(2), pages 355-365, July.
    16. Huaijun Wang & Ruomeng Ke & Junhuai Li & Yang An & Kan Wang & Lei Yu, 2018. "A correlation-based binary particle swarm optimization method for feature selection in human activity recognition," International Journal of Distributed Sensor Networks, , vol. 14(4), pages 15501477187, April.
    17. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    18. Tymoteusz Miller & Grzegorz Mikiciuk & Anna Kisiel & Małgorzata Mikiciuk & Dominika Paliwoda & Lidia Sas-Paszt & Danuta Cembrowska-Lech & Adrianna Krzemińska & Agnieszka Kozioł & Adam Brysiewicz, 2023. "Machine Learning Approaches for Forecasting the Best Microbial Strains to Alleviate Drought Impact in Agriculture," Agriculture, MDPI, vol. 13(8), pages 1-16, August.
    19. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    20. Dongfen Li & Yundan Zheng & Xiaofang Liu & Jie Zhou & Yuqiao Tan & Xiaolong Yang & Mingzhe Liu, 2022. "Hierarchical Quantum Information Splitting of an Arbitrary Two-Qubit State Based on a Decision Tree," Mathematics, MDPI, vol. 10(23), pages 1-18, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:6:y:2019:i:4:d:10.1007_s40745-019-00217-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.