IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v20y2009i1p99-120.html
   My bibliography  Save this article

Impact of the Union and Difference Operations on the Quality of Information Products

Author

Listed:
  • Amir Parssian

    (Department of Information Systems, Instituto de Empresa Business School, Madrid 28006, Spain)

  • Sumit Sarkar

    (School of Management, University of Texas at Dallas, Richardson, Texas 75080)

  • Varghese S. Jacob

    (School of Management, University of Texas at Dallas, Richardson, Texas 75080)

Abstract

Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.

Suggested Citation

  • Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2009. "Impact of the Union and Difference Operations on the Quality of Information Products," Information Systems Research, INFORMS, vol. 20(1), pages 99-120, March.
  • Handle: RePEc:inm:orisre:v:20:y:2009:i:1:p:99-120
    DOI: 10.1287/isre.1070.0161
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1070.0161
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1070.0161?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kon, Henry B. & Madnick, Stuart E. & Siegel, Michael D., 1995. "Good answers from bad data : a data management strategy," Working papers 3868-95., Massachusetts Institute of Technology (MIT), Sloan School of Management.
    2. Craig W. Fisher & InduShobha Chengalur-Smith & Donald P. Ballou, 2003. "The Impact of Experience and Time on the Use of Data Quality Information in Decision Making," Information Systems Research, INFORMS, vol. 14(2), pages 170-188, June.
    3. Donald Ballou & Richard Wang & Harold Pazer & Giri Kumar Tayi, 1998. "Modeling Information Manufacturing Systems to Determine Information Product Quality," Management Science, INFORMS, vol. 44(4), pages 462-484, April.
    4. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2004. "Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product," Management Science, INFORMS, vol. 50(7), pages 967-982, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    2. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    2. Dominikus Kleindienst, 2017. "The data quality improvement plan: deciding on choice and sequence of data quality improvements," Electronic Markets, Springer;IIM University of St. Gallen, vol. 27(4), pages 387-398, November.
    3. Debabrata Dey & Subodha Kumar, 2010. "Reassessing Data Quality for Information Products," Management Science, INFORMS, vol. 56(12), pages 2316-2322, December.
    4. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2004. "Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product," Management Science, INFORMS, vol. 50(7), pages 967-982, July.
    5. Hazen, Benjamin T. & Weigel, Fred K. & Ezell, Jeremy D. & Boehmke, Bradley C. & Bradley, Randy V., 2017. "Toward understanding outcomes associated with data quality improvement," International Journal of Production Economics, Elsevier, vol. 193(C), pages 737-747.
    6. Hazen, Benjamin T. & Boone, Christopher A. & Ezell, Jeremy D. & Jones-Farmer, L. Allison, 2014. "Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications," International Journal of Production Economics, Elsevier, vol. 154(C), pages 72-80.
    7. Xitong Li & Hongwei Zhu & Luo Zuo, 2021. "Reporting Technologies and Textual Readability: Evidence from the XBRL Mandate," Information Systems Research, INFORMS, vol. 32(3), pages 1025-1042, September.
    8. Juha-Miikka Nurmilaakso, 2014. "Coordination costs and ICT investments: an economic analysis," Netnomics, Springer, vol. 15(2), pages 57-67, September.
    9. Xiao, Yu & Lu, Louis Y.Y. & Liu, John S. & Zhou, Zhili, 2014. "Knowledge diffusion path analysis of data quality literature: A main path analysis," Journal of Informetrics, Elsevier, vol. 8(3), pages 594-605.
    10. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    11. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    12. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    13. Even, Adir & Shankaranarayanan, G. & Berger, Paul D., 2010. "Managing the Quality of Marketing Data: Cost/benefit Tradeoffs and Optimal Configuration," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 209-221.
    14. Paul Glowalla & Ali Sunyaev, 2013. "Process-Driven Data Quality Management Through Integration of Data Quality into Existing Process Models," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 5(6), pages 433-448, December.
    15. Dunk, Alan S., 2007. "Innovation budget pressure, quality of IS information, and departmental performance," The British Accounting Review, Elsevier, vol. 39(2), pages 115-124.
    16. Ann-Frances Cameron & Jane Webster, 2013. "Multicommunicating: Juggling Multiple Conversations in the Workplace," Information Systems Research, INFORMS, vol. 24(2), pages 352-371, June.
    17. Klein, B. D. & Rossin, D. F., 1999. "Data quality in neural network models: effect of error rate and magnitude of error on predictive accuracy," Omega, Elsevier, vol. 27(5), pages 569-582, October.
    18. Bonney, Maurice & Jaber, Mohamad Y., 2013. "Developing an input–output activity matrix (IOAM) for environmental and economic analysis of manufacturing systems and logistics chains," International Journal of Production Economics, Elsevier, vol. 143(2), pages 589-597.
    19. Rajiv D. Banker & Robert J. Kauffman, 2004. "50th Anniversary Article: The Evolution of Research on Information Systems: A Fiftieth-Year Survey of the Literature in Management Science," Management Science, INFORMS, vol. 50(3), pages 281-298, March.
    20. Maria Grazia Fugini & Barbara Pernici & Filippo Ramoni, 2009. "Quality analysis of composed services through fault injection," Information Systems Frontiers, Springer, vol. 11(3), pages 227-239, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:20:y:2009:i:1:p:99-120. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.