IDEAS home Printed from https://ideas.repec.org/a/wsi/jikmxx/v17y2018i02ns0219649218500223.html
   My bibliography  Save this article

New Rough Set-Aided Mechanism for Text Categorisation

Author

Listed:
  • N. Venkata Sailaja

    (Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad 500090, Telangana, India)

  • L. Padma Sree

    (Department of Electronics and Communication Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad 500090, Telangana, India)

  • N. Mangathayaru

    (Department of Information Technology, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad 500090, Telangana, India)

Abstract

With the advent of computers and the information age, statistical and analytical problems have grown in terms of both size and complexity. Challenges in core domains of data storage, organisation and searching have evolved to the new research field called data mining. Text classification using various machine learning (ML) mechanisms encounters the difficulty of the high dimensionality of attributes vector. Therefore, a feature selection technique is very much required to discard irrelevant as well as noisy attributes from the feature set vector so that the ML algorithms can work efficiently. In this paper, rough set theory (RST)-based attribute selection methodology is applied to achieve text classification goal. A hybrid method based on RST is proposed for text documents classification. Further, the proposed method’s performance is evaluated on standard datasets.

Suggested Citation

  • N. Venkata Sailaja & L. Padma Sree & N. Mangathayaru, 2018. "New Rough Set-Aided Mechanism for Text Categorisation," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 17(02), pages 1-19, June.
  • Handle: RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500223
    DOI: 10.1142/S0219649218500223
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0219649218500223
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219649218500223?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Berry, Michael W. & Browne, Murray & Langville, Amy N. & Pauca, V. Paul & Plemmons, Robert J., 2007. "Algorithms and applications for approximate nonnegative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 155-173, September.
    2. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bastian Schaefermeier & Gerd Stumme & Tom Hanika, 2021. "Topic space trajectories," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5759-5795, July.
    2. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    3. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    4. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    5. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    6. Juan Shi & Kin Keung Lai & Ping Hu & Gang Chen, 2018. "Factors dominating individual information disseminating behavior on social networking sites," Information Technology and Management, Springer, vol. 19(2), pages 121-139, June.
    7. Ganesh Dash & Chetan Sharma & Shamneesh Sharma, 2023. "Sustainable Marketing and the Role of Social Media: An Experimental Study Using Natural Language Processing (NLP)," Sustainability, MDPI, vol. 15(6), pages 1-16, March.
    8. Jianfei Cao & Han Yang & Jianshu Lv & Quanyuan Wu & Baolei Zhang, 2023. "Estimating Soil Salinity with Different Levels of Vegetation Cover by Using Hyperspectral and Non-Negative Matrix Factorization Algorithm," IJERPH, MDPI, vol. 20(4), pages 1-15, February.
    9. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    10. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.
    11. Gissler, Stefan & Oldfather, Jeremy & Ruffino, Doriana, 2016. "Lending on hold: Regulatory uncertainty and bank lending standards," Journal of Monetary Economics, Elsevier, vol. 81(C), pages 89-101.
    12. Alina Evstigneeva & Mark Sidorovskiy, 2021. "Assessment of Clarity of Bank of Russia Monetary Policy Communication by Neural Network Approach," Russian Journal of Money and Finance, Bank of Russia, vol. 80(3), pages 3-33, September.
    13. Hei-Chia Wang & Tzu-Ting Hsu & Yunita Sari, 2019. "Personal research idea recommendation using research trends and a hierarchical topic model," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1385-1406, December.
    14. Takehiro Sano & Tsuyoshi Migita & Norikazu Takahashi, 2022. "A novel update rule of HALS algorithm for nonnegative matrix factorization and Zangwill’s global convergence," Journal of Global Optimization, Springer, vol. 84(3), pages 755-781, November.
    15. Marcin Chlebus & Maciej Stefan Świtała, 2020. "So close and so far. Finding similar tendencies in econometrics and machine learning papers. Topic models comparison," Working Papers 2020-16, Faculty of Economic Sciences, University of Warsaw.
    16. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W. & Lessmann, Stefan, 2020. "Incorporating textual information in customer churn prediction models based on a convolutional neural network," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1563-1578.
    17. Hutchison, Paul D. & Daigle, Ronald J. & George, Benjamin, 2018. "Application of latent semantic analysis in AIS academic research," International Journal of Accounting Information Systems, Elsevier, vol. 31(C), pages 83-96.
    18. Emad Mohamed & Sayed A. Mostafa, 2019. "Computing Happiness from Textual Data," Stats, MDPI, vol. 2(3), pages 1-24, July.
    19. Andrej Čopar & Blaž Zupan & Marinka Zitnik, 2019. "Fast optimization of non-negative matrix tri-factorization," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-15, June.
    20. Jake R. Nelson & Tony H. Grubesic, 2018. "Environmental Justice: A Panoptic Overview Using Scientometrics," Sustainability, MDPI, vol. 10(4), pages 1-18, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500223. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.