IDEAS home Printed from https://ideas.repec.org/a/vrs/mjsosc/v8y2017i3p203-207n14.html
   My bibliography  Save this article

Record Linkage using Probabilistic Methods and Data Mining Techniques

Author

Listed:
  • Elezaj Ogerta
  • Tuxhari Gloria

    (Faculty of Economy, University of Tirana, Tirana, Albania)

Abstract

Nowadays corporations and organizations acquire large amounts of information daily which is stored in many large databases (DB). These databases mostly are heterogeneous and the data are represented differently. Data in these DB may simply be inaccurate and there is a need to clean these DB. The record linkage process is considered to be part of the data cleaning phase when working with big scale surveys considered as a data mining step. Record linkage is an important process in data integration, which consists in finding duplication records and finding matched records too. This process can be divided in two main steps Exact Record Linkage, which founds all the exact matches between two records and Probabilistic Record Linkage, which matches records that are not exactly equal but have a high probability of being equal. In recent years, the record linkage becomes an important process in data mining task. As the databases are becoming more and more complex, finding matching records is a crucial task. Comparing each possible pair of records in large DB is impossible via manual/automatic procedures. Therefore, special algorithms (blocking methods) have to be used to reduce computational complexity of comparison space among records. The paper will discuss the deterministic and probabilistic methods used for record linkage. Also, different supervised and unsupervised techniques will be discussed. Results of a real world datasets linkage (Albanian Population and Housing Census 2011 and farmers list registered by Food Safety and Veterinary Institute) will be presented.

Suggested Citation

  • Elezaj Ogerta & Tuxhari Gloria, 2017. "Record Linkage using Probabilistic Methods and Data Mining Techniques," Mediterranean Journal of Social Sciences, Sciendo, vol. 8(3), pages 203-207, May.
  • Handle: RePEc:vrs:mjsosc:v:8:y:2017:i:3:p:203-207:n:14
    DOI: 10.5901/mjss.2017.v8n3p203
    as

    Download full text from publisher

    File URL: https://doi.org/10.5901/mjss.2017.v8n3p203
    Download Restriction: no

    File URL: https://libkey.io/10.5901/mjss.2017.v8n3p203?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:mjsosc:v:8:y:2017:i:3:p:203-207:n:14. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.