IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i7p194-d848579.html
   My bibliography  Save this article

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

Author

Listed:
  • Yousif A. Alhaj

    (Sanaa Community College, Sanaa 5695, Yemen)

  • Abdelghani Dahou

    (Mathematics and Computer Science Department, Ahmed Draia University, Adrar 01000, Algeria)

  • Mohammed A. A. Al-qaness

    (State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
    Faculty of Engineering, Sana’a University, Sana’a 12544, Yemen)

  • Laith Abualigah

    (Faculty of Information Technology, Middle East University, Amman 11831, Jordan
    Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan)

  • Aaqif Afzaal Abbasi

    (Department of Software Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan)

  • Nasser Ahmed Obad Almaweri

    (Sanaa Community College, Sanaa 5695, Yemen)

  • Mohamed Abd Elaziz

    (Faculty of Computer Science and Engineering, Galala University, Suez 435611, Egypt
    Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman P.O. Box 346, United Arab Emirates
    Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt)

  • Robertas Damaševičius

    (Department of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, Lithuania)

Abstract

We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

Suggested Citation

  • Yousif A. Alhaj & Abdelghani Dahou & Mohammed A. A. Al-qaness & Laith Abualigah & Aaqif Afzaal Abbasi & Nasser Ahmed Obad Almaweri & Mohamed Abd Elaziz & Robertas Damaševičius, 2022. "A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language," Future Internet, MDPI, vol. 14(7), pages 1-18, June.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/7/194/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/7/194/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Gerard Salton & Chris Buckley, 1990. "Improving retrieval performance by relevance feedback," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(4), pages 288-297, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xinlu Li & Yuanyuan Lei & Shengwei Ji, 2022. "BERT- and BiLSTM-Based Sentiment Analysis of Online Chinese Buzzwords," Future Internet, MDPI, vol. 14(11), pages 1-15, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yu, Shiwei & Wei, Yi-Ming & Fan, Jingli & Zhang, Xian & Wang, Ke, 2012. "Exploring the regional characteristics of inter-provincial CO2 emissions in China: An improved fuzzy clustering analysis based on particle swarm optimization," Applied Energy, Elsevier, vol. 92(C), pages 552-562.
    2. Moraes, Marcelo Botelho da Costa & Nagano, Marcelo Seido, 2014. "Evolutionary models in cash management policies with multiple assets," Economic Modelling, Elsevier, vol. 39(C), pages 1-7.
    3. Lee, In Gyu & Yoon, Sang Won & Won, Daehan, 2022. "A Mixed Integer Linear Programming Support Vector Machine for Cost-Effective Group Feature Selection: Branch-Cut-and-Price Approach," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1055-1068.
    4. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    5. Mohammad Mahdi Mousavi & Jamal Ouenniche & Kaoru Tone, 2023. "A dynamic performance evaluation of distress prediction models," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(4), pages 756-784, July.
    6. Wang, Xin & Liu, Xiaodong & Pedrycz, Witold & Zhu, Xiaolei & Hu, Guangfei, 2012. "Mining axiomatic fuzzy set association rules for classification problems," European Journal of Operational Research, Elsevier, vol. 218(1), pages 202-210.
    7. Pendharkar, Parag C. & Troutt, Marvin D., 2011. "DEA based dimensionality reduction for classification problems satisfying strict non-satiety assumption," European Journal of Operational Research, Elsevier, vol. 212(1), pages 155-163, July.
    8. Yi, Tao & Cheng, Xiaobin & Peng, Peng, 2022. "Two-stage optimal allocation of charging stations based on spatiotemporal complementarity and demand response: A framework based on MCS and DBPSO," Energy, Elsevier, vol. 239(PC).
    9. Alireza Pourdaryaei & Mohammad Mohammadi & Mazaher Karimi & Hazlie Mokhlis & Hazlee A. Illias & Seyed Hamidreza Aghay Kaboli & Shameem Ahmad, 2021. "Recent Development in Electricity Price Forecasting Based on Computational Intelligence Techniques in Deregulated Power Market," Energies, MDPI, vol. 14(19), pages 1-28, September.
    10. Panagopoulos, Orestis P. & Pappu, Vijay & Xanthopoulos, Petros & Pardalos, Panos M., 2016. "Constrained subspace classifier for high dimensional datasets," Omega, Elsevier, vol. 59(PA), pages 40-46.
    11. Zhixiang Chen & Bin Fu & John Abraham, 2010. "A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm with a fixed query updating factor," Journal of Combinatorial Optimization, Springer, vol. 19(2), pages 134-157, February.
    12. Bin, Wei & Qinke, Peng & Jing, Zhao & Xiao, Chen, 2012. "A binary particle swarm optimization algorithm inspired by multi-level organizational learning behavior," European Journal of Operational Research, Elsevier, vol. 219(2), pages 224-233.
    13. Roland Graef & Mathias Klier & Kilian Kluge & Jan Felix Zolitschka, 2021. "Human-machine collaboration in online customer service – a long-term feedback-based approach," Electronic Markets, Springer;IIM University of St. Gallen, vol. 31(2), pages 319-341, June.
    14. Asim Roy & Patrick Mackin & Jyrki Wallenius & James Corner & Mark Keith & Gregory Schymik & Hina Arora, 2008. "An Interactive Search Method Based on User Preferences," Decision Analysis, INFORMS, vol. 5(4), pages 203-229, December.
    15. Zouache, Djaafar & Moussaoui, Abdelouahab & Ben Abdelaziz, Fouad, 2018. "A cooperative swarm intelligence algorithm for multi-objective discrete optimization with application to the knapsack problem," European Journal of Operational Research, Elsevier, vol. 264(1), pages 74-88.
    16. Mariam Daoud & Jimmy Xiangji Huang, 2013. "Modeling geographic, temporal, and proximity contexts for improving geotemporal search," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(1), pages 190-212, January.
    17. Veda C. Storey & Andrew Burton-Jones & Vijayan Sugumaran & Sandeep Purao, 2008. "CONQUER: A Methodology for Context-Aware Query Processing on the World Wide Web," Information Systems Research, INFORMS, vol. 19(1), pages 3-25, March.
    18. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    19. Huang, Yuming & Ge, Bingfeng & Hipel, Keith W. & Fang, Liping & Zhao, Bin & Yang, Kewei, 2023. "Solving the inverse graph model for conflict resolution using a hybrid metaheuristic algorithm," European Journal of Operational Research, Elsevier, vol. 305(2), pages 806-819.
    20. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.