IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0151885.html
   My bibliography  Save this article

Identifying Topics in Microblogs Using Wikipedia

Author

Listed:
  • Ahmet Yıldırım
  • Suzan Üsküdarlı
  • Arzucan Özgür

Abstract

Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail.

Suggested Citation

  • Ahmet Yıldırım & Suzan Üsküdarlı & Arzucan Özgür, 2016. "Identifying Topics in Microblogs Using Wikipedia," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-20, March.
  • Handle: RePEc:plo:pone00:0151885
    DOI: 10.1371/journal.pone.0151885
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151885
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0151885&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0151885?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0151885. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.