IDEAS home Printed from https://ideas.repec.org/a/igg/jsita0/v8y2017i4p16-29.html
   My bibliography  Save this article

An Unsupervised Entity Resolution Framework for English and Arabic Datasets

Author

Listed:
  • Abdelkrim OUHAB

    (EEDIS Laboratory, DjillaliLiabes University, Sidi Bel Abbes, Algeria)

  • Mimoun MALKI

    (LabRI-SBA Laboratory, Ecole Supérieure en Informatique de Sidi Bel Abbes, Sidi Bel Abbes, Algeria)

  • Djamel BERRABAH

    (EEDIS Laboratory, DjillaliLiabes University, Sidi Bel Abbes, Algeria)

  • Faouzi BOUFARES

    (LIPN Laboratory, Paris 13 University, Villetaneuse, France)

Abstract

Entity resolution (ER) is an important step in data integration and in many data mining projects; its goal is to identify records that refer to the same real-world entity. Most existing ER frameworks have focused on datasets in Latin-based languages and do not support Arabic language. In this article, the authors present an unsupervised ER framework that supports English and Arabic datasets. Rather than using matching rules developed by an expert or manually labeled training examples, the proposed framework automatically generates its own training set. The generated training set is then used to train a classifier and learn a classification model. Finally, the learned classification model is used to perform ER. The proposed framework was implemented and tested on three Arabic datasets and four English datasets. Experimental results show that the proposed framework is competitive with supervised approaches and outperform recently proposed unsupervised approaches in terms of F-measure.

Suggested Citation

  • Abdelkrim OUHAB & Mimoun MALKI & Djamel BERRABAH & Faouzi BOUFARES, 2017. "An Unsupervised Entity Resolution Framework for English and Arabic Datasets," International Journal of Strategic Information Technology and Applications (IJSITA), IGI Global, vol. 8(4), pages 16-29, October.
  • Handle: RePEc:igg:jsita0:v:8:y:2017:i:4:p:16-29
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJSITA.2017100102
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jsita0:v:8:y:2017:i:4:p:16-29. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.