IDEAS home Printed from https://ideas.repec.org/a/igg/jkbo00/v7y2017i4p1-18.html
   My bibliography  Save this article

Identifying Emerging Topics and Content Change from Evolving Document Sets

Author

Listed:
  • Parvathi Chundi

    (University of Nebraska-Omaha, Department of Computer Science, Omaha, NE, USA)

Abstract

Document sets where the content is evolving frequently occur often in organizations. It is common for oranizations to update the policy documents periodically and for a news story to evolve over a period of time. When a document set evolves, some of the old content may remain unchanged while some other new content may be added. Depending on the amount of changes, users may need to read and/or analyze the new content once again. Evolving content may make it hard for users to track the changes and understand the global view of the change. In this paper, we consider document sets consisting of documents published at two different points of time and develop a measure to capture the change in content between the documents published at two different time points. We divide a document set into two subsets – a subset of documents containing documents published at an earlier date and another subset containing documents published at a later date. We use Latent Dirichlet Allocation to extract a topic and word distributions for each of the two subsets of the document set. We then compute similarity of the set of topics computed for each subset to measure the amount of change in the content. We study the effectiveness of the method on two data sets – a set of privacy policy documents and a set of Reuters news articles extracted from the TDT-Pilot Corpus and present the experimental results.

Suggested Citation

  • Parvathi Chundi, 2017. "Identifying Emerging Topics and Content Change from Evolving Document Sets," International Journal of Knowledge-Based Organizations (IJKBO), IGI Global, vol. 7(4), pages 1-18, October.
  • Handle: RePEc:igg:jkbo00:v:7:y:2017:i:4:p:1-18
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJKBO.2017100101
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jkbo00:v:7:y:2017:i:4:p:1-18. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.