IDEAS home Printed from https://ideas.repec.org/a/epw/ejece0/v4y2020i1id19156.html

Discovering XML Conditional Dependencies for Data Quality Issues

Author

Listed:
  • Mohammed Ragheb Hakawati

    (CETIC Research Centre, Belgium.)

  • Yasmin Yacob

    (University Malaysia Perlis, Malaysia.)

  • Amiza Amir

    (University Malaysia Perlis, Malaysia.)

  • Jabiry M. Mohammed

    (Management & Science University (MSU), Shah Alam, Malaysia.)

  • Khalid Jamal Jadaa

    (University Malaysia Perlis, Malaysia.)

Abstract

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Suggested Citation

  • Mohammed Ragheb Hakawati & Yasmin Yacob & Amiza Amir & Jabiry M. Mohammed & Khalid Jamal Jadaa, 2020. "Discovering XML Conditional Dependencies for Data Quality Issues," European Journal of Electrical Engineering and Computer Science, European Open Science, vol. 4(1), January.
  • Handle: RePEc:epw:ejece0:v:4:y:2020:i:1:id:19156
    DOI: 10.24018/ejece.2020.4.1.156
    as

    Download full text from publisher

    File URL: https://eu-opensci.org/index.php/ejece/article/view/19156
    File Function: Abstract page
    Download Restriction: no

    File URL: https://eu-opensci.org/index.php/ejece/article/download/19156/11084
    File Function: Full text
    Download Restriction: no

    File URL: https://libkey.io/10.24018/ejece.2020.4.1.156?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:epw:ejece0:v:4:y:2020:i:1:id:19156. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: support (email available below). General contact details of provider: https://eu-opensci.org/index.php/ejece .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.