IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/vyid10.1007_s10796-016-9690-6.html
   My bibliography  Save this article

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Author

Listed:
  • Qi Liu

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Gengzhong Feng

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Nengmin Wang

    (Xi’an JiaoTong University
    The key lab of the ministry of education for process control and efficiency engineering)

  • Giri Kumar Tayi

    (SUNY at Albany)

Abstract

Discovering knowledge from data means finding useful patterns in data, this process has increased the opportunity and challenge for businesses in the big data era. Meanwhile, improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment. Various models have been developed in the past; however, few used both data quality and prior knowledge to control the quality of the discovery processes and results. In this paper, a multi-objective model of knowledge discovery in databases is developed, which aids the discovery process by utilizing prior process knowledge and different measures of data quality. To illustrate the model, association rule mining is considered and formulated as a multi-objective problem that takes into account data quality measures and prior process knowledge instead of a single objective problem. Measures such as confidence, support, comprehensibility and interestingness are used. A Pareto-based integrated multi-objective Artificial Bee Colony (IMOABC) algorithm is developed to solve the problem. Using well-known and publicly available databases, experiments are carried out to compare the performance of IMOABC with NSGA-II, MOPSO and Apriori algorithms, respectively. The computational results show that IMOABC outperforms NSGA-II, MOPSO and Apriori on different measures and it could be easily customized or tailored to be in line with user requirements and still generates high-quality association rules.

Suggested Citation

  • Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 0. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 0, pages 1-16.
  • Handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-016-9690-6
    DOI: 10.1007/s10796-016-9690-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-016-9690-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-016-9690-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Corne, David & Dhaenens, Clarisse & Jourdan, Laetitia, 2012. "Synergies between operations research and data mining: The emerging use of multi-objective approaches," European Journal of Operational Research, Elsevier, vol. 221(3), pages 469-479.
    2. Sikora, Riyaz & Piramuthu, Selwyn, 2007. "Framework for efficient feature selection in genetic algorithm based data mining," European Journal of Operational Research, Elsevier, vol. 180(2), pages 723-737, July.
    3. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    4. Qiang Yang & Xindong Wu, 2006. "10 Challenging Problems In Data Mining Research," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 5(04), pages 597-604.
    5. Jinwook Lee & András Prékopa, 2013. "Properties and calculation of multivariate risk measures: MVaR and MCVaR," Annals of Operations Research, Springer, vol. 211(1), pages 225-254, December.
    6. César Guerra-García & Ismael Caballero & Mario Piattini, 2013. "Capturing data quality requirements for web applications by means of DQ_WebRE," Information Systems Frontiers, Springer, vol. 15(3), pages 433-445, July.
    7. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2004. "Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product," Management Science, INFORMS, vol. 50(7), pages 967-982, July.
    8. de la Iglesia, B. & Richards, G. & Philpott, M.S. & Rayward-Smith, V.J., 2006. "The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification," European Journal of Operational Research, Elsevier, vol. 169(3), pages 898-917, March.
    9. Tomo Popovic & Mladen Kezunovic & Bozo Krstajic, 2015. "Smart grid data analytics for digital protective relay event recordings," Information Systems Frontiers, Springer, vol. 17(3), pages 591-600, June.
    10. Atanu Lahiri & Debabrata Dey, 2013. "Effects of Piracy on Quality of Information Goods," Management Science, INFORMS, vol. 59(1), pages 245-264, June.
    11. Szeto, W.Y. & Wu, Yongzhong & Ho, Sin C., 2011. "An artificial bee colony algorithm for the capacitated vehicle routing problem," European Journal of Operational Research, Elsevier, vol. 215(1), pages 126-135, November.
    12. Naeem Khalid Janjua & Farookh Khadeer Hussain & Omar Khadeer Hussain, 2013. "Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making," Information Systems Frontiers, Springer, vol. 15(2), pages 167-192, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Babak Daneshvar Rouyendegh & Kazim Topuz & Ali Dag & Asil Oztekin, 2019. "An AHP-IFT Integrated Model for Performance Evaluation of E-Commerce Web Sites," Information Systems Frontiers, Springer, vol. 21(6), pages 1345-1355, December.
    2. Manuela Svoboda, 2022. "Evaluation of Motivation, Expectation, and Present Situation in 3rd Year Undergraduate Students of German Language and Literature at the University of Rijeka, Croatia," European Journal of Education Articles, Revistia Research and Publishing, vol. 5, July -Dec.
    3. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    4. Yalcin, Ahmet Selcuk & Kilic, Huseyin Selcuk & Delen, Dursun, 2022. "The use of multi-criteria decision-making methods in business analytics: A comprehensive literature review," Technological Forecasting and Social Change, Elsevier, vol. 174(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Liu & Gengzhong Feng & Nengmin Wang & Giri Kumar Tayi, 2018. "A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge," Information Systems Frontiers, Springer, vol. 20(2), pages 401-416, April.
    2. Qi Liu & Gengzhong Feng & Giri Kumar Tayi & Jun Tian, 2021. "Managing Data Quality of the Data Warehouse: A Chance-Constrained Programming Approach," Information Systems Frontiers, Springer, vol. 23(2), pages 375-389, April.
    3. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    4. Clarisse Dhaenens & Laetitia Jourdan, 2022. "Metaheuristics for data mining: survey and opportunities for big data," Annals of Operations Research, Springer, vol. 314(1), pages 117-140, July.
    5. Van Nguyen, Truong & Zhang, Jie & Zhou, Li & Meng, Meng & He, Yong, 2020. "A data-driven optimization of large-scale dry port location using the hybrid approach of data mining and complex network theory," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 134(C).
    6. James Fan & Christopher Griffin, 2014. "Optimal Digital Product Maintenance with a Continuous Revenue Stream," Papers 1412.8624, arXiv.org, revised Feb 2017.
    7. Zhang, Zhiwang & Gao, Guangxia & Shi, Yong, 2014. "Credit risk evaluation using multi-criteria optimization classifier with kernel, fuzzification and penalty factors," European Journal of Operational Research, Elsevier, vol. 237(1), pages 335-348.
    8. Wang, Zutong & Guo, Jiansheng & Zheng, Mingfa & Wang, Ying, 2015. "Uncertain multiobjective traveling salesman problem," European Journal of Operational Research, Elsevier, vol. 241(2), pages 478-489.
    9. Jinwook Lee & András Prékopa, 2015. "Decision-making from a risk assessment perspective for Corporate Mergers and Acquisitions," Computational Management Science, Springer, vol. 12(2), pages 243-266, April.
    10. Wagner, Laura & Gürbüz, Mustafa Ҫagri & Parlar, Mahmut, 2019. "Is it fake? Using potentially low quality suppliers as back-up when genuine suppliers are unavailable," International Journal of Production Economics, Elsevier, vol. 213(C), pages 185-200.
    11. Can Sun & Yonghua Ji & Xianjun Geng, 2023. "Which Enemy to Dance with? A New Role of Software Piracy in Influencing Antipiracy Strategies," Information Systems Research, INFORMS, vol. 34(4), pages 1711-1727, December.
    12. DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Business and Economics.
    13. Raeesi, Ramin & Sahebjamnia, Navid & Mansouri, S. Afshin, 2023. "The synergistic effect of operational research and big data analytics in greening container terminal operations: A review and future directions," European Journal of Operational Research, Elsevier, vol. 310(3), pages 943-973.
    14. Zhan, Xingbin & Szeto, W.Y. & Shui, C.S. & Chen, Xiqun (Michael), 2021. "A modified artificial bee colony algorithm for the dynamic ride-hailing sharing problem," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 150(C).
    15. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    16. Dan Wu & Guofang Nan & Minqiang Li, 2020. "Optimal Piracy Control: Should a Firm Implement Digital Rights Management?," Information Systems Frontiers, Springer, vol. 22(4), pages 947-960, August.
    17. Amir Parssian & Sumit Sarkar & Varghese S. Jacob, 2009. "Impact of the Union and Difference Operations on the Quality of Information Products," Information Systems Research, INFORMS, vol. 20(1), pages 99-120, March.
    18. Harshita Patel & Dharmendra Singh Rajput & G Thippa Reddy & Celestine Iwendi & Ali Kashif Bashir & Ohyun Jo, 2020. "A review on classification of imbalanced data for wireless sensor networks," International Journal of Distributed Sensor Networks, , vol. 16(4), pages 15501477209, April.
    19. Caballini, Claudia & Gracia, Maria D. & Mar-Ortiz, Julio & Sacone, Simona, 2020. "A combined data mining – optimization approach to manage trucks operations in container terminals with the use of a TAS: Application to an Italian and a Mexican port," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 142(C).
    20. Nie, Jiajia & Zhong, Ling & Li, Gendao & Cao, Kuo, 2022. "Piracy as an entry deterrence strategy in software market," European Journal of Operational Research, Elsevier, vol. 298(2), pages 560-572.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-016-9690-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.