IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v197y2009i2p764-772.html
   My bibliography  Save this article

Data preparation using data quality matrices for classification mining

Author

Listed:
  • Davidson, Ian
  • Tayi, Giri

Abstract

Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.

Suggested Citation

  • Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
  • Handle: RePEc:eee:ejores:v:197:y:2009:i:2:p:764-772
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377-2217(08)00560-2
    Download Restriction: Full text for ScienceDirect subscribers only
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    2. Donald Ballou & Richard Wang & Harold Pazer & Giri Kumar Tayi, 1998. "Modeling Information Manufacturing Systems to Determine Information Product Quality," Management Science, INFORMS, vol. 44(4), pages 462-484, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    2. Perko, Igor, 2017. "Behaviour-based short-term invoice probability of default evaluation," European Journal of Operational Research, Elsevier, vol. 257(3), pages 1045-1054.
    3. Farnè, Matteo & Vouldis, Angelos T., 2018. "A methodology for automised outlier detection in high-dimensional datasets: an application to euro area banks' supervisory data," Working Paper Series 2171, European Central Bank.
    4. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    5. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    2. Zhang, Zhiwang & Gao, Guangxia & Shi, Yong, 2014. "Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors," European Journal of Operational Research, Elsevier, vol. 237(1), pages 335-348.
    3. Maysam Eftekhary & Peyman Gholami & Saeed Safari & Mohammad Shojaee, 2012. "Ranking Normalization Methods for Improving the Accuracy of SVM Algorithm by DEA Method," Modern Applied Science, Canadian Center of Science and Education, vol. 6(10), pages 1-26, October.
    4. Xitong Li & Hongwei Zhu & Luo Zuo, 2021. "Reporting Technologies and Textual Readability: Evidence from the XBRL Mandate," Information Systems Research, INFORMS, vol. 32(3), pages 1025-1042, September.
    5. Ramli, Azizul Azhar & Watada, Junzo & Pedrycz, Witold, 2011. "Real-time fuzzy regression analysis: A convex hull approach," European Journal of Operational Research, Elsevier, vol. 210(3), pages 606-617, May.
    6. necula, sabina-cristiana & Radu, Laura-Diana, 2011. "Decision Support Systems Usefulness and A Practical Solution Based on Semantic Web Technologies," MPRA Paper 51547, University Library of Munich, Germany.
    7. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    8. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    9. Blanquero, Rafael & Carrizosa, Emilio & Molero-Río, Cristina & Romero Morales, Dolores, 2020. "Sparsity in optimal randomized classification trees," European Journal of Operational Research, Elsevier, vol. 284(1), pages 255-272.
    10. Juha-Miikka Nurmilaakso, 2014. "Coordination costs and ICT investments: an economic analysis," Netnomics, Springer, vol. 15(2), pages 57-67, September.
    11. Xiao, Yu & Lu, Louis Y.Y. & Liu, John S. & Zhou, Zhili, 2014. "Knowledge diffusion path analysis of data quality literature: A main path analysis," Journal of Informetrics, Elsevier, vol. 8(3), pages 594-605.
    12. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2009. "Impact of the Union and Difference Operations on the Quality of Information Products," Information Systems Research, INFORMS, vol. 20(1), pages 99-120, March.
    13. R Fildes & K Nikolopoulos & S F Crone & A A Syntetos, 2008. "Forecasting and operational research: a review," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(9), pages 1150-1172, September.
    14. Caballini, Claudia & Gracia, Maria D. & Mar-Ortiz, Julio & Sacone, Simona, 2020. "A combined data mining – optimization approach to manage trucks operations in container terminals with the use of a TAS: Application to an Italian and a Mexican port," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 142(C).
    15. Besseris, George J., 2012. "Profiling effects in industrial data mining by non-parametric DOE methods: An application on screening checkweighing systems in packaging operations," European Journal of Operational Research, Elsevier, vol. 220(1), pages 147-161.
    16. Even, Adir & Shankaranarayanan, G. & Berger, Paul D., 2010. "Managing the Quality of Marketing Data: Cost/benefit Tradeoffs and Optimal Configuration," Journal of Interactive Marketing, Elsevier, vol. 24(3), pages 209-221.
    17. Du, Yu & Lin, Xiaodong & Pham, Minh & Ruszczyński, Andrzej, 2021. "Selective linearization for multi-block statistical learning," European Journal of Operational Research, Elsevier, vol. 293(1), pages 219-228.
    18. Paul Glowalla & Ali Sunyaev, 2013. "Process-Driven Data Quality Management Through Integration of Data Quality into Existing Process Models," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 5(6), pages 433-448, December.
    19. Daniel Gartner & Yiye Zhang & Rema Padman, 2018. "Cognitive workload reduction in hospital information systems," Health Care Management Science, Springer, vol. 21(2), pages 224-243, June.
    20. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:197:y:2009:i:2:p:764-772. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.