IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v6y2014i4p688-716d41976.html
   My bibliography  Save this article

The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving

Author

Listed:
  • Thomas Risse

    (L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany)

  • Elena Demidova

    (L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany)

  • Stefan Dietze

    (L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany)

  • Wim Peters

    (NLP Group, Department of Computer Science, University of Sheffield, S1 4DP Sheffield, UK)

  • Nikolaos Papailiou

    (ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece)

  • Katerina Doka

    (ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece)

  • Yannis Stavrakas

    (ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece)

  • Vassilis Plachouras

    (ATHENA - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Maroussi, Athens, Greece)

  • Pierre Senellart

    (CNRS LTCIT, Institut Mines-Télécom, Télécom ParisTech, 75634 Paris Cedex 13, France)

  • Florent Carpentier

    (Internet Memory Foundation, 45 ter rue de la Révolution, 93100 Montreuil, France)

  • Amin Mantrach

    (Yahoo Research, 08018 Barcelona, Spain)

  • Bogdan Cautis

    (CNRS LTCIT, Institut Mines-Télécom, Télécom ParisTech, 75634 Paris Cedex 13, France)

  • Patrick Siehndel

    (L3S Research Center, Leibniz Universität Hannover, Hannover 30167, Germany)

  • Dimitris Spiliotopoulos

    (Athens Technology Center (ATC), 15233 Halandri Athens, Greece)

Abstract

The constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts.

Suggested Citation

  • Thomas Risse & Elena Demidova & Stefan Dietze & Wim Peters & Nikolaos Papailiou & Katerina Doka & Yannis Stavrakas & Vassilis Plachouras & Pierre Senellart & Florent Carpentier & Amin Mantrach & Bogda, 2014. "The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving," Future Internet, MDPI, vol. 6(4), pages 1-29, November.
  • Handle: RePEc:gam:jftint:v:6:y:2014:i:4:p:688-716:d:41976
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/6/4/688/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/6/4/688/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Elena Demidova & Nicola Barbieri & Stefan Dietze & Adam Funk & Helge Holzmann & Diana Maynard & Nikolaos Papailiou & Wim Peters & Thomas Risse & Dimitris Spiliotopoulos, 2014. "Analysing and Enriching Focused Semantic Web Archives for Parliament Applications," Future Internet, MDPI, vol. 6(3), pages 1-24, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:6:y:2014:i:4:p:688-716:d:41976. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.