IDEAS home Printed from https://ideas.repec.org/a/icf/icfjcs/v7y2013i3p27-34.html
   My bibliography  Save this article

Preprocessing Techniques for Effective Data Extraction and Computation

Author

Listed:
  • M Saraswathi
  • V Balu

Abstract

World Wide Web information is semi-structured due to the nested structure of HTML code—a lot of information is linked, and much of the Web information is redundant. Web Text Mining helps the whole knowledge mining process to discover and extract the valuable information from unstructured text. The unstructured texts, which contain massive amount of information, cannot simply be used for further processing by computers. Therefore, this paper discusses the importance of standard preprocessing methods and various steps involved in getting the required content effectively. This paper proposes an effective preprocessing and dimensionality reduction technique, which helps in simplifying or speeding up computations; it can improve the text categorization and performance.

Suggested Citation

  • M Saraswathi & V Balu, 2013. "Preprocessing Techniques for Effective Data Extraction and Computation," The IUP Journal of Computer Sciences, IUP Publications, vol. 0(3), pages 27-34, July.
  • Handle: RePEc:icf:icfjcs:v:7:y:2013:i:3:p:27-34
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:icf:icfjcs:v:7:y:2013:i:3:p:27-34. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: G R K Murty (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.