IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v23y2021i2d10.1007_s10796-019-09963-5.html
   My bibliography  Save this article

Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach

Author

Listed:
  • Qi Liu

    (Xi’an JiaoTong University
    The Key Lab of the Ministry of Education for Process Control and Efficiency Engineering)

  • Gengzhong Feng

    (Xi’an JiaoTong University
    The Key Lab of the Ministry of Education for Process Control and Efficiency Engineering)

  • Giri Kumar Tayi

    (SUNY at Albany)

  • Jun Tian

    (Xi’an JiaoTong University
    The Key Lab of the Ministry of Education for Process Control and Efficiency Engineering)

Abstract

To make informed decisions, managers establish data warehouses that integrate multiple data sources. However, the outcomes of the data warehouse-based decisions are not always satisfactory due to low data quality. Although many studies focused on data quality management, little effort has been made to explore effective data quality control strategies for the data warehouse. In this study, we propose a chance-constrained programming model that determines the optimal strategy for allocating the control resources to mitigate the data quality problems of the data warehouse. We develop a modified Artificial Bee Colony algorithm to solve the model. Our work contributes to the literature on evaluation of data quality problem propagation in data integration process and data quality control on the data sources that make up the data warehouse. We use a data warehouse in the healthcare organization to illustrate the model and the effectiveness of the algorithm.

Suggested Citation

  • Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
  • Handle: RePEc:spr:infosf:v:23:y:2021:i:2:d:10.1007_s10796-019-09963-5
    DOI: 10.1007/s10796-019-09963-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-019-09963-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-019-09963-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Roman Lukyanenko & Andrea Wiggins & Holly K. Rosser, 0. "Citizen Science: An Information Quality Research Frontier," Information Systems Frontiers, Springer, vol. 0, pages 1-23.
    2. Jizhou Lu & Gengzhong Feng & Kin Keung Lai & Nengmin Wang, 2017. "The bullwhip effect on inventory: a perspective on information quality," Applied Economics, Taylor & Francis Journals, vol. 49(24), pages 2322-2338, May.
    3. Rajiv Arora & Payal Pahwa & Daya Gupta, 2017. "Data quality improvement in data warehouse: a framework," International Journal of Data Analysis Techniques and Strategies, Inderscience Enterprises Ltd, vol. 9(1), pages 17-33.
    4. Yingcheng Xu & Li Wang & Bo Xu & Wei Jiang & Chaoqun Deng & Fang Ji & Xiaobo Xu, 2019. "An information integration and transmission model of multi-source data for product quality and safety," Information Systems Frontiers, Springer, vol. 21(1), pages 191-212, February.
    5. Yingcheng Xu & Li Wang & Bo Xu & Wei Jiang & Chaoqun Deng & Fang Ji & Xiaobo Xu, 0. "An information integration and transmission model of multi-source data for product quality and safety," Information Systems Frontiers, Springer, vol. 0, pages 1-22.
    6. Zhengrui Jiang & Sumit Sarkar & Prabuddha De & Debabrata Dey, 2007. "A Framework for Reconciling Attribute Values from Multiple Data Sources," Management Science, INFORMS, vol. 53(12), pages 1946-1963, December.
    7. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    8. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    9. X. Qin & G. Huang, 2009. "An Inexact Chance-constrained Quadratic Programming Model for Stream Water Quality Management," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 23(4), pages 661-695, March.
    10. Debabrata Dey & Subodha Kumar, 2010. "Reassessing Data Quality for Information Products," Management Science, INFORMS, vol. 56(12), pages 2316-2322, December.
    11. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    12. A. Charnes & W. W. Cooper, 1959. "Chance-Constrained Programming," Management Science, INFORMS, vol. 6(1), pages 73-79, October.
    13. Agung Wahyudi & George Kuk & Marijn Janssen, 2018. "A Process Pattern Model for Tackling and Improving Big Data Quality," Information Systems Frontiers, Springer, vol. 20(3), pages 457-469, June.
    14. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2004. "Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product," Management Science, INFORMS, vol. 50(7), pages 967-982, July.
    15. Cannella, Salvatore & Framinan, Jose M. & Bruccoleri, Manfredi & Barbosa-Póvoa, Ana Paula & Relvas, Susana, 2015. "The effect of Inventory Record Inaccuracy in Information Exchange Supply Chains," European Journal of Operational Research, Elsevier, vol. 243(1), pages 120-129.
    16. Poojari, Chandra A. & Varghese, Boby, 2008. "Genetic Algorithm based technique for solving Chance Constrained Problems," European Journal of Operational Research, Elsevier, vol. 185(3), pages 1128-1154, March.
    17. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2009. "Impact of the Union and Difference Operations on the Quality of Information Products," Information Systems Research, INFORMS, vol. 20(1), pages 99-120, March.
    18. Xue Bai & Ramayya Krishnan & Rema Padman & Harry Jiannan Wang, 2013. "On Risk Management with Information Flows in Business Processes," Information Systems Research, INFORMS, vol. 24(3), pages 731-749, September.
    19. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    20. Szeto, W.Y. & Wu, Yongzhong & Ho, Sin C., 2011. "An artificial bee colony algorithm for the capacitated vehicle routing problem," European Journal of Operational Research, Elsevier, vol. 215(1), pages 126-135, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anders Haug & Aleksandra Magdalena Staskiewicz & Lars Hvam, 2023. "Strategies for Master Data Management: A Case Study of an International Hearing Healthcare Company," Information Systems Frontiers, Springer, vol. 25(5), pages 1903-1923, October.
    2. Yuan Li & William J. Kettinger, 2022. "Testing the Relationship Between Information and Knowledge in Computer-Aided Decision-Making," Information Systems Frontiers, Springer, vol. 24(6), pages 1827-1843, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Debabrata Dey & Subodha Kumar, 2013. "Data Quality of Query Results with Generalized Selection Conditions," Operations Research, INFORMS, vol. 61(1), pages 17-31, February.
    2. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    3. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
    4. Xu, M. & Zhuan, X., 2013. "Optimal planning for wind power capacity in an electric power system," Renewable Energy, Elsevier, vol. 53(C), pages 280-286.
    5. Xiangyu Chang & Yinghui Huang & Mei Li & Xin Bo & Subodha Kumar, 2021. "Efficient Detection of Environmental Violators: A Big Data Approach," Production and Operations Management, Production and Operations Management Society, vol. 30(5), pages 1246-1270, May.
    6. Feifei Dong & Yong Liu & Han Su & Zhongyao Liang & Rui Zou & Huaicheng Guo, 2016. "Uncertainty-Based Multi-Objective Decision Making with Hierarchical Reliability Analysis Under Water Resources and Environmental Constraints," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 30(2), pages 805-822, January.
    7. Sun, Wei & Huang, Guo H. & Lv, Ying & Li, Gongchen, 2013. "Inexact joint-probabilistic chance-constrained programming with left-hand-side randomness: An application to solid waste management," European Journal of Operational Research, Elsevier, vol. 228(1), pages 217-225.
    8. Feifei Dong & Yong Liu & Han Su & Zhongyao Liang & Rui Zou & Huaicheng Guo, 2016. "Uncertainty-Based Multi-Objective Decision Making with Hierarchical Reliability Analysis Under Water Resources and Environmental Constraints," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 30(2), pages 805-822, January.
    9. Lu, Jizhou & Feng, Gengzhong & Shum, Stephen & Lai, Kin Keung, 2021. "On the value of information sharing in the presence of information errors," European Journal of Operational Research, Elsevier, vol. 294(3), pages 1139-1152.
    10. Yalcin, Ahmet Selcuk & Kilic, Huseyin Selcuk & Delen, Dursun, 2022. "The use of multi-criteria decision-making methods in business analytics: A comprehensive literature review," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    11. Manuela Svoboda, 2022. "Evaluation of Motivation, Expectation, and Present Situation in 3rd Year Undergraduate Students of German Language and Literature at the University of Rijeka, Croatia," European Journal of Education Articles, Revistia Research and Publishing, vol. 5, July -Dec.
    12. Udhayakumar, A. & Charles, V. & Kumar, Mukesh, 2011. "Stochastic simulation based genetic algorithm for chance constrained data envelopment analysis problems," Omega, Elsevier, vol. 39(4), pages 387-397, August.
    13. Qiushi Chen & Lei Zhao & Jan C. Fransoo & Zhe Li, 2019. "Dual-mode inventory management under a chance credit constraint," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 41(1), pages 147-178, March.
    14. Hazen, Benjamin T. & Boone, Christopher A. & Ezell, Jeremy D. & Jones-Farmer, L. Allison, 2014. "Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications," International Journal of Production Economics, Elsevier, vol. 154(C), pages 72-80.
    15. Dominikus Kleindienst, 2017. "The data quality improvement plan: deciding on choice and sequence of data quality improvements," Electronic Markets, Springer;IIM University of St. Gallen, vol. 27(4), pages 387-398, November.
    16. Azarnoosh Kafi & Behrouz Daneshian & Mohsen Rostamy-Malkhalifeh, 2021. "Forecasting the confidence interval of efficiency in fuzzy DEA," Operations Research and Decisions, Wroclaw University of Science and Technology, Faculty of Management, vol. 31(1), pages 41-59.
    17. Sander Claeys & Marta Vanin & Frederik Geth & Geert Deconinck, 2021. "Applications of optimization models for electricity distribution networks," Wiley Interdisciplinary Reviews: Energy and Environment, Wiley Blackwell, vol. 10(5), September.
    18. Scott, James & Ho, William & Dey, Prasanta K. & Talluri, Srinivas, 2015. "A decision support system for supplier selection and order allocation in stochastic, multi-stakeholder and multi-criteria environments," International Journal of Production Economics, Elsevier, vol. 166(C), pages 226-237.
    19. Minjiao Zhang & Simge Küçükyavuz & Saumya Goel, 2014. "A Branch-and-Cut Method for Dynamic Decision Making Under Joint Chance Constraints," Management Science, INFORMS, vol. 60(5), pages 1317-1333, May.
    20. Hermann Held, 2019. "Cost Risk Analysis: Dynamically Consistent Decision-Making under Climate Targets," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 72(1), pages 247-261, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:23:y:2021:i:2:d:10.1007_s10796-019-09963-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.