IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0265190.html
   My bibliography  Save this article

Rough set based information theoretic approach for clustering uncertain categorical data

Author

Listed:
  • Jamal Uddin
  • Rozaida Ghazali
  • Jemal H. Abawajy
  • Habib Shah
  • Noor Aida Husaini
  • Asim Zeb

Abstract

Motivation: Many real applications such as businesses and health generate large categorical datasets with uncertainty. A fundamental task is to efficiently discover hidden and non-trivial patterns from such large uncertain categorical datasets. Since the exact value of an attribute is often unknown in uncertain categorical datasets, conventional clustering analysis algorithms do not provide a suitable means for dealing with categorical data, uncertainty, and stability. Problem statement: The ability of decision making in the presence of vagueness and uncertainty in data can be handled using Rough Set Theory. Though, recent categorical clustering techniques based on Rough Set Theory help but they suffer from low accuracy, high computational complexity, and generalizability especially on data sets where they sometimes fail or hardly select their best clustering attribute. Objectives: The main objective of this research is to propose a new information theoretic based Rough Purity Approach (RPA). Another objective of this work is to handle the problems of traditional Rough Set Theory based categorical clustering techniques. Hence, the ultimate goal is to cluster uncertain categorical datasets efficiently in terms of the performance, generalizability and computational complexity. Methods: The RPA takes into consideration information-theoretic attribute purity of the categorical-valued information systems. Several extensive experiments are conducted to evaluate the efficiency of RPA using a real Supplier Base Management (SBM) and six benchmark UCI datasets. The proposed RPA is also compared with several recent categorical data clustering techniques. Results: The experimental results show that RPA outperforms the baseline algorithms. The significant percentage improvement with respect to time (66.70%), iterations (83.13%), purity (10.53%), entropy (14%), and accuracy (12.15%) as well as Rough Accuracy of clusters show that RPA is suitable for practical usage. Conclusion: We conclude that as compared to other techniques, the attribute purity of categorical-valued information systems can better cluster the data. Hence, RPA technique can be recommended for large scale clustering in multiple domains and its performance can be enhanced for further research.

Suggested Citation

  • Jamal Uddin & Rozaida Ghazali & Jemal H. Abawajy & Habib Shah & Noor Aida Husaini & Asim Zeb, 2022. "Rough set based information theoretic approach for clustering uncertain categorical data," PLOS ONE, Public Library of Science, vol. 17(5), pages 1-23, May.
  • Handle: RePEc:plo:pone00:0265190
    DOI: 10.1371/journal.pone.0265190
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0265190
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0265190&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0265190?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gao, Yang & Zhang, Xiao & Wu, Lei & Yin, Shijiu & Lu, Jiao, 2017. "Resource basis, ecosystem and growth of grain family farm in China: Based on rough set theory and hierarchical linear model," Agricultural Systems, Elsevier, vol. 154(C), pages 157-167.
    2. Chen, Li-Fei & Tsai, Chih-Tsung, 2016. "Data mining framework based on rough set theory to improve location selection decisions: A case study of a restaurant chain," Tourism Management, Elsevier, vol. 53(C), pages 197-206.
    3. Leung, Yee & Fischer, Manfred M. & Wu, Wei-Zhi & Mi, Ju-Sheng, 2008. "A rough set approach for the discovery of classification rules in interval-valued information systems," MPRA Paper 77767, University Library of Munich, Germany.
    4. Jamal Uddin & Rozaida Ghazali & Mustafa Mat Deris, 2017. "An Empirical Analysis of Rough Set Categorical Clustering Techniques," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-22, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Olga Porro & Francesc Pardo-Bosch & Núria Agell & Mónica Sánchez, 2020. "Understanding Location Decisions of Energy Multinational Enterprises within the European Smart Cities’ Context: An Integrated AHP and Extended Fuzzy Linguistic TOPSIS Method," Energies, MDPI, vol. 13(10), pages 1-29, May.
    2. Han, Shuihua & Jia, Xinyun & Chen, Xinming & Gupta, Shivam & Kumar, Ajay & Lin, Zhibin, 2022. "Search well and be wise: A machine learning approach to search for a profitable location," Journal of Business Research, Elsevier, vol. 144(C), pages 416-427.
    3. Mahdi Zareei & Cesar Vargas-Rosales & Mohammad Hossein Anisi & Leila Musavian & Rafaela Villalpando-Hernandez & Shidrokh Goudarzi & Ehab Mahmoud Mohamed, 2019. "Enhancing the Performance of Energy Harvesting Sensor Networks for Environmental Monitoring Applications," Energies, MDPI, vol. 12(14), pages 1-14, July.
    4. Blasco-Arcas, Lorena & Kastanakis, Minas N. & Alcañiz, Mariano & Reyes-Menendez, Ana, 2023. "Leveraging user behavior and data science technologies for management: An overview," Journal of Business Research, Elsevier, vol. 154(C).
    5. Huang, Yatao & Jiao, Wenxian & Wang, Kang & Li, Erling & Yan, Yutong & Chen, Jingyang & Guo, Xuanxuan, 2022. "Examining the multidimensional energy poverty trap and its determinants: An empirical analysis at household and community levels in six provinces of China," Energy Policy, Elsevier, vol. 169(C).
    6. Yi-Shian Lee & Lee-Ing Tong, 2012. "Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming," Energies, MDPI, vol. 5(3), pages 1-16, February.
    7. Yu, Lili & Zhao, Duanyang & Xue, Zihao & Gao, Yang, 2020. "Research on the use of digital finance and the adoption of green control techniques by family farms in China," Technology in Society, Elsevier, vol. 62(C).
    8. Yinsheng Yang & Qianwei Zhuang & Guangdong Tian & Silin Wei, 2018. "A Management and Environmental Performance Evaluation of China’s Family Farms Using an Ultimate Comprehensive Cross-Efficiency Model (UCCE)," Sustainability, MDPI, vol. 11(1), pages 1-25, December.
    9. Gao, Yang & Niu, Ziheng & Yang, Haoran & Yu, Lili, 2019. "Impact of green control techniques on family farms' welfare," Ecological Economics, Elsevier, vol. 161(C), pages 91-99.
    10. Pan Wang & Di Liu, 2023. "Why Are Farmers Reluctant to Sell: Evidence from Rural China," Agriculture, MDPI, vol. 13(4), pages 1-12, March.
    11. Chaozhu Li & Xiaoliang Li & Wei Jia, 2022. "Non-Farm Employment Experience, Risk Preferences, and Low-Carbon Agricultural Technology Adoption: Evidence from 1843 Grain Farmers in 14 Provinces in China," Agriculture, MDPI, vol. 13(1), pages 1-16, December.
    12. Chen, Linlin & Han, Shuihua & Ye, Zhen & Xia, Senmao, 2023. "The optimisation of the location of front distribution centre: A spatio-temporal joint perspective," International Journal of Production Economics, Elsevier, vol. 263(C).
    13. Tiyachareonsri, Sirikunya & Chavarnakul, Thira & Chandrachai, Achara & Triukose, Sipat, 2024. "How consumer preference determines site selection in a metropolitan setting: Analysis of retailer perspective to stay ahead of the competition in the aftermath of a large-scale crisis," Journal of Retailing and Consumer Services, Elsevier, vol. 78(C).
    14. Wang, Xiuyuan & Shen, Lei & Liu, Tingting & Wei, Wenwen & Zhang, Shuai & Tuerti, Tayir & Li, Luhua & Zhang, Wei, 2023. "Juvenile plumcot tree can improve fruit quality and economic benefits by intercropping with alfalfa in semi-arid areas," Agricultural Systems, Elsevier, vol. 205(C).
    15. Mao, Hui & Quan, Yurong & Fu, Yong & Chen, Shaojian, 2022. "Risk preferences, productive investment and straw return technology adoption by farmers in China," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322087, Agricultural and Applied Economics Association.
    16. Lei Luo & Dakuan Qiao & Ruixin Zhang & Chenhao Luo & Xinhong Fu & Yuying Liu, 2022. "Research on the Influence of Education of Farmers’ Cooperatives on the Adoption of Green Prevention and Control Technologies by Members: Evidence from Rural China," IJERPH, MDPI, vol. 19(10), pages 1-17, May.
    17. Tooraj Karimi & Arvin Hojati & Jeffrey Yi-Lin Forrest, 2022. "A new methodology for sustainability measurement of banks based on rough set theory," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 30(1), pages 415-431, March.
    18. Li, Qi & Li, Kai, 2020. "Rice farmers’ demands for productive services: evidence from Chinese farmers," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 23(3), September.
    19. Barbati, Maria & Corrente, Salvatore & Greco, Salvatore, 2020. "A general space-time model for combinatorial optimization problems (and not only)," Omega, Elsevier, vol. 96(C).
    20. Yu, Lili & Niu, Ziheng & Gao, Yang & Tian, Borui, 2019. "Support policy preferences of grain family farms: evidence from Huang-huai-hai plain of China," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 23(5), October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0265190. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.