IDEAS home Printed from https://ideas.repec.org/p/mit/sloanp/40085.html

Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage

Author

Listed:
  • Prat, Nicolas
  • Madnick, Stuart E.

Abstract

Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability, a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where mashups facilitate the combination of data from different sources. Our approach for assessing data believability is based on provenance and lineage, i.e. the origin and subsequent processing history of data. We present the main concepts of our model for representing and storing data provenance, and an ontology of the sub-dimensions of data believability. We then use aggregation operators to compute believability across the sub-dimensions of data believability and the provenance of data. We illustrate our approach with a scenario based on Internet data. Our contribution lies in three main design artifacts (1) the provenance model (2) the ontology of believability subdimensions and (3) the method for computing and aggregating data believability. To our knowledge, this is the first work to operationalize provenance-based assessment of data believability.

Suggested Citation

  • Prat, Nicolas & Madnick, Stuart E., 2008. "Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage," Working papers 40085, Massachusetts Institute of Technology (MIT), Sloan School of Management.
  • Handle: RePEc:mit:sloanp:40085
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/1721.1/40085
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Donald P. Ballou & Harold L. Pazer, 1985. "Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems," Management Science, INFORMS, vol. 31(2), pages 150-162, February.
    2. Prat, Nicolas & Madnick, Stuart E., 2008. "Measuring Data Believability: A Provenance Approach," Working papers 40086, Massachusetts Institute of Technology (MIT), Sloan School of Management.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Risto Silvola & Janne Harkonen & Olli Vilppola & Hanna Kropsu-Vehkapera & Harri Haapasalo, 2016. "Data quality assessment and improvement," International Journal of Business Information Systems, Inderscience Enterprises Ltd, vol. 22(1), pages 62-81.
    2. repec:jtr:journl:v:4:y:2012:i:1:p:12-37 is not listed on IDEAS
    3. Sabrina Sicari & Cinzia Cappiello & Francesco Pellegrini & Daniele Miorandi & Alberto Coen-Porisini, 2016. "A security-and quality-aware system architecture for Internet of Things," Information Systems Frontiers, Springer, vol. 18(4), pages 665-677, August.
    4. Michnik, Jerzy & Lo, Mei-Chen, 2009. "The assessment of the information quality with the aid of multiple criteria analysis," European Journal of Operational Research, Elsevier, vol. 195(3), pages 850-856, June.
    5. Donald Ballou & Richard Wang & Harold Pazer & Giri Kumar Tayi, 1998. "Modeling Information Manufacturing Systems to Determine Information Product Quality," Management Science, INFORMS, vol. 44(4), pages 462-484, April.
    6. Klein, B. D. & Rossin, D. F., 1999. "Data quality in neural network models: effect of error rate and magnitude of error on predictive accuracy," Omega, Elsevier, vol. 27(5), pages 569-582, October.
    7. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    8. Bernd Heinrich & Marcus Hopf & Daniel Lohninger & Alexander Schiller & Michael Szubartowicz, 2021. "Data quality in recommender systems: the impact of completeness of item content data on prediction accuracy of recommender systems," Electronic Markets, Springer;IIM University of St. Gallen, vol. 31(2), pages 389-409, June.
    9. Xue Bai & Manuel Nunez & Jayant R. Kalagnanam, 2012. "Managing Data Quality Risk in Accounting Information Systems," Information Systems Research, INFORMS, vol. 23(2), pages 453-473, June.
    10. Vliegen, Lea & Moroff, Nikolas Ulrich & Riehl, Katharina, 2020. "Evaluation of data quality in dimensioning capacity," Chapters from the Proceedings of the Hamburg International Conference of Logistics (HICL), in: Kersten, Wolfgang & Blecker, Thorsten & Ringle, Christian M. (ed.), Data Science and Innovation in Supply Chain Management: How Data Transforms the Value Chain. Proceedings of the Hamburg International Conference of Lo, volume 29, pages 355-394, Hamburg University of Technology (TUHH), Institute of Business Logistics and General Management.
    11. Dongpu Fu & Yili Hong & Kanliang Wang & Weiguo Fan, 2018. "Effects of membership tier on user content generation behaviors: evidence from online reviews," Electronic Commerce Research, Springer, vol. 18(3), pages 457-483, September.
    12. Kon, Henry B. & Siegel, Michael D., 2003. "Error browsing and mediation : interoperability regarding data error," Working papers #94-15, Massachusetts Institute of Technology (MIT), Sloan School of Management.
    13. Jalil, M.N. & Zuidwijk, R.A. & Fleischmann, M. & van Nunen, J.A.E.E., 2009. "Spare Parts Logistics and Installed Base Information," ERIM Report Series Research in Management ERS-2009-002-LIS, Erasmus Research Institute of Management (ERIM), ERIM is the joint research institute of the Rotterdam School of Management, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam.
    14. Benita M. Gullkvist, 2013. "Drivers of change in management accounting practices in an ERP environment," International Journal of Business and Economic Sciences Applied Research (IJBESAR), Democritus University of Thrace (DUTH), Kavala Campus, Greece, vol. 6(2), pages 149-174, September.
    15. Tavakkoli, Sakineh & Macknick, Jordan & Heath, Garvin A. & Jordaan, Sarah M., 2021. "Spatiotemporal energy infrastructure datasets for the United States: A review," Renewable and Sustainable Energy Reviews, Elsevier, vol. 152(C).
    16. Hazen, Benjamin T. & Weigel, Fred K. & Ezell, Jeremy D. & Boehmke, Bradley C. & Bradley, Randy V., 2017. "Toward understanding outcomes associated with data quality improvement," International Journal of Production Economics, Elsevier, vol. 193(C), pages 737-747.
    17. Koziel, Sylvie & Hilber, Patrik & Westerlund, Per & Shayesteh, Ebrahim, 2021. "Investments in data quality: Evaluating impacts of faulty data on asset management in power systems," Applied Energy, Elsevier, vol. 281(C).
    18. Jingran Wang & Yi Liu & Peigong Li & Zhenxing Lin & Stavros Sindakis & Sakshi Aggarwal, 2024. "Overview of Data Quality: Examining the Dimensions, Antecedents, and Impacts of Data Quality," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 15(1), pages 1159-1178, March.
    19. Choo Yeon Kim & Seong Soo Cha, 2023. "Effect of SNS Characteristics for Dining Out on Customer Satisfaction and Online Word of Mouth," SAGE Open, , vol. 13(3), pages 21582440231, September.
    20. Xiangyu Chang & Yinghui Huang & Mei Li & Xin Bo & Subodha Kumar, 2021. "Efficient Detection of Environmental Violators: A Big Data Approach," Production and Operations Management, Production and Operations Management Society, vol. 30(5), pages 1246-1270, May.
    21. Xiao, Yu & Lu, Louis Y.Y. & Liu, John S. & Zhou, Zhili, 2014. "Knowledge diffusion path analysis of data quality literature: A main path analysis," Journal of Informetrics, Elsevier, vol. 8(3), pages 594-605.

    More about this item

    Keywords

    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mit:sloanp:40085. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: None The email address of this maintainer does not seem to be valid anymore. Please ask None to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/ssmitus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.