IDEAS home Printed from https://ideas.repec.org/a/ags/gazdal/58927.html
   My bibliography  Save this article

Text Categorization Using Only Fragments Of Documents

Author

Listed:
  • Pilaszy, Istvan
  • Dobrowiecki, Tadeusz

Abstract

In this paper we presented a lot of experiments that examine how the particular parts of the documents do contribute to the performance of a classifier. We evaluated text classifiers on two very different text corpora. We conclude that some parts of the text are more important from the point of text classification performance. Giving higher weights to more important parts can increase the performance of the classifier. The question, that which parts are more or less important depends on the nature of the documents in the corpora. Some tasks that remains to be done: − More text corpora should be investigated. − In section 6.4 we optimized the number of features to be kept independent from the section. However, it could be optimized for each section. − Splitting the documents into parts of 50 words, to examine what if the parts are of equal size not only inside a document, but among the documents too. − When splitting documents into k equal parts, we may combine the classifiers resulted from different k values.

Suggested Citation

  • Pilaszy, Istvan & Dobrowiecki, Tadeusz, 2007. "Text Categorization Using Only Fragments Of Documents," GAZDÁLKODÁS: Scientific Journal on Agricultural Economics, Karoly Robert University College, vol. 51(Special E), pages 1-8.
  • Handle: RePEc:ags:gazdal:58927
    DOI: 10.22004/ag.econ.58927
    as

    Download full text from publisher

    File URL: https://ageconsearch.umn.edu/record/58927/files/Pilaszy_Dobrowiecki_2007_19ksz_214_211.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.22004/ag.econ.58927?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ags:gazdal:58927. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: AgEcon Search (email available below). General contact details of provider: https://edirc.repec.org/data/gtkrfhu.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.