Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability, a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where mashups facilitate the combination of data from different sources. Our approach for assessing data believability is based on provenance and lineage, i.e. the origin and subsequent processing history of data. We present the main concepts of our model for representing and storing data provenance, and an ontology of the sub-dimensions of data believability. We then use aggregation operators to compute believability across the sub-dimensions of data believability and the provenance of data. We illustrate our approach with a scenario based on Internet data. Our contribution lies in three main design artifacts (1) the provenance model (2) the ontology of believability subdimensions and (3) the method for computing and aggregating data believability. To our knowledge, this is the first work to operationalize provenance-based assessment of data believability.
|Date of creation:||11 Jan 2008|
|Contact details of provider:|| Postal: MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT), SLOAN SCHOOL OF MANAGEMENT, 50 MEMORIAL DRIVE CAMBRIDGE MASSACHUSETTS 02142 USA|
Web page: http://mitsloan.mit.edu/
More information through EDIRC
|Order Information:|| Postal: MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT), SLOAN SCHOOL OF MANAGEMENT, 50 MEMORIAL DRIVE CAMBRIDGE MASSACHUSETTS 02142 USA|
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Prat, Nicolas & Madnick, Stuart E., 2008. "Measuring Data Believability: A Provenance Approach," Working papers 40086, Massachusetts Institute of Technology (MIT), Sloan School of Management.
- Donald P. Ballou & Harold L. Pazer, 1985. "Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems," Management Science, INFORMS, vol. 31(2), pages 150-162, February.
When requesting a correction, please mention this item's handle: RePEc:mit:sloanp:40085. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christian Zimmermann)
If references are entirely missing, you can add them using this form.