IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v53y2007i12p1946-1963.html
   My bibliography  Save this article

A Framework for Reconciling Attribute Values from Multiple Data Sources

Author

Listed:
  • Zhengrui Jiang

    (College of Business, University of North Alabama, Florence, Alabama 35632)

  • Sumit Sarkar

    (School of Management, University of Texas at Dallas, Richardson, Texas 75083)

  • Prabuddha De

    (Krannert School of Management, Purdue University, West Lafayette, Indiana 47907)

  • Debabrata Dey

    (Michael G. Foster School of Business, University of Washington, Seattle, Washington 98195)

Abstract

Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the "best" value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single "best" value has to be chosen and stored for every attribute of an entity.

Suggested Citation

  • Zhengrui Jiang & Sumit Sarkar & Prabuddha De & Debabrata Dey, 2007. "A Framework for Reconciling Attribute Values from Multiple Data Sources," Management Science, INFORMS, vol. 53(12), pages 1946-1963, December.
  • Handle: RePEc:inm:ormnsc:v:53:y:2007:i:12:p:1946-1963
    DOI: 10.1287/mnsc.1070.0745
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.1070.0745
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.1070.0745?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Debabrata Dey & Sumit Sarkar & Prabuddha De, 1998. "A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases," Management Science, INFORMS, vol. 44(10), pages 1379-1395, October.
    2. Ramayya Krishnan & James Peters & Rema Padman & David Kaplan, 2005. "On Data Reliability Assessment in Accounting Information Systems," Information Systems Research, INFORMS, vol. 16(3), pages 307-326, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yiting Xing & Ling Li & Zhuming Bi & Marzena Wilamowska‐Korsak & Li Zhang, 2013. "Operations Research (OR) in Service Industries: A Comprehensive Review," Systems Research and Behavioral Science, Wiley Blackwell, vol. 30(3), pages 300-353, May.
    2. Dominikus Kleindienst, 2017. "The data quality improvement plan: deciding on choice and sequence of data quality improvements," Electronic Markets, Springer;IIM University of St. Gallen, vol. 27(4), pages 387-398, November.
    3. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    4. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xitong Li & Hongwei Zhu & Luo Zuo, 2021. "Reporting Technologies and Textual Readability: Evidence from the XBRL Mandate," Information Systems Research, INFORMS, vol. 32(3), pages 1025-1042, September.
    2. Kocsis, David, 2019. "A conceptual foundation of design and implementation research in accounting information systems," International Journal of Accounting Information Systems, Elsevier, vol. 34(C), pages 1-1.
    3. Xiao, Yu & Lu, Louis Y.Y. & Liu, John S. & Zhou, Zhili, 2014. "Knowledge diffusion path analysis of data quality literature: A main path analysis," Journal of Informetrics, Elsevier, vol. 8(3), pages 594-605.
    4. Kartik Hosanagar, 2011. "Usercentric Operational Decision Making in Distributed Information Retrieval," Information Systems Research, INFORMS, vol. 22(4), pages 739-755, December.
    5. Debabrata Dey, 2003. "Record Matching in Data Warehouses: A Decision Model for Data Consolidation," Operations Research, INFORMS, vol. 51(2), pages 240-254, April.
    6. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    7. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    8. Jiexun Li & G. Alan Wang & Hsinchun Chen, 2011. "Identity matching using personal and social identity features," Information Systems Frontiers, Springer, vol. 13(1), pages 101-113, March.
    9. Xue Bai & Manuel Nunez & Jayant R. Kalagnanam, 2012. "Managing Data Quality Risk in Accounting Information Systems," Information Systems Research, INFORMS, vol. 23(2), pages 453-473, June.
    10. Stoel, M. Dale & Muhanna, Waleed A., 2011. "IT internal control weaknesses and firm performance: An organizational liability lens," International Journal of Accounting Information Systems, Elsevier, vol. 12(4), pages 280-304.
    11. Markus Schäfermeyer & Christoph Rosenkranz & Roland Holten, 2012. "The Impact of Business Process Complexity on Business Process Standardization," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 4(5), pages 261-270, October.
    12. Xue Bai & Ramayya Krishnan & Rema Padman & Harry Jiannan Wang, 2013. "On Risk Management with Information Flows in Business Processes," Information Systems Research, INFORMS, vol. 24(3), pages 731-749, September.
    13. Caroline Lancelot Miltgen, 2009. "Propension à fournir des données personnelles mensongères sur Internet : une étude exploratoire," Post-Print hal-01117029, HAL.
    14. Xue Bai, 2012. "A Mathematical Framework for Data Quality Management in Enterprise Systems," INFORMS Journal on Computing, INFORMS, vol. 24(4), pages 648-664, November.
    15. Kim, Rosemary & Gangolly, Jagdish & Elsas, Philip, 2017. "A framework for analytics and simulation of accounting information systems: A Petri net modeling primer," International Journal of Accounting Information Systems, Elsevier, vol. 27(C), pages 30-54.
    16. Guan, Jian & Levitan, Alan S. & Kuhn, John R., 2013. "How AIS can progress along with ontology research in IS," International Journal of Accounting Information Systems, Elsevier, vol. 14(1), pages 21-38.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:53:y:2007:i:12:p:1946-1963. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.