IDEAS home Printed from https://ideas.repec.org/h/zbw/hiclch/267183.html
   My bibliography  Save this book chapter

Outlier detection in data mining: Exclusion of errors or loss of information?

In: Changing Tides: The New Role of Resilience and Sustainability in Logistics and Supply Chain Management – Innovative Approaches for the Shift to a New Era. Proceedings of the Hamburg International Conference of Logistics (HICL), Vol. 33

Author

Listed:
  • Hochkamp, Florian
  • Rabe, Markus

Abstract

Purpose: Our research emphasizes the importance of considering outliers in production logistics tasks. With a growing amount of data, we require data mining to cope with these tasks. We underline that the widespread exclusion of outliers in data pre-processing for data mining leads to a loss of information and that using outlier interpretation can be used to address the issue. Methodology: The paper discusses the data pre-processing of data mining in production logistics problems. Methods of outlier interpretation are collected based on a literature review. In addition to the literature-based investigation, the work relies on a case study that illustrates the individual evaluation of outliers. Findings: This work shows that outliers take a special focus on the information generation. Within data pre-processing, a distinction must be made between an outlier as a defect and an outlier as a special datum. This can be conducted by methods presented in the literature. Originality: This paper adds to existing literature in the research field of insufficiently analyzed outlier interpretation and shows a need for research in data pre-processing of data mining.

Suggested Citation

  • Hochkamp, Florian & Rabe, Markus, 2022. "Outlier detection in data mining: Exclusion of errors or loss of information?," Chapters from the Proceedings of the Hamburg International Conference of Logistics (HICL), in: Kersten, Wolfgang & Jahn, Carlos & Blecker, Thorsten & Ringle, Christian M. (ed.), Changing Tides: The New Role of Resilience and Sustainability in Logistics and Supply Chain Management – Innovative Approaches for the Shift to a New , volume 33, pages 91-117, Hamburg University of Technology (TUHH), Institute of Business Logistics and General Management.
  • Handle: RePEc:zbw:hiclch:267183
    DOI: 10.15480/882.4689
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/267183/1/hicl-2021-33-091.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.15480/882.4689?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vic Barnett, 1978. "The Study of Outliers: Purpose and Model," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 27(3), pages 242-250, November.
    2. Douglas M. Hawkins, 1980. "Critical Values for Identifying Outliers," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(1), pages 95-96, March.
    3. Bugra Alkan & Daniel A. Vera & Mussawar Ahmad & Bilal Ahmad & Robert Harrison, 2018. "Complexity in manufacturing systems and its measures: a literature review," European Journal of Industrial Engineering, Inderscience Enterprises Ltd, vol. 12(1), pages 116-150.
    4. Hunker, Joachim & Scheidler, Anne Antonia & Rabe, Markus, 2020. "A systematic classification of database solutions for data mining to support tasks in supply chains," Chapters from the Proceedings of the Hamburg International Conference of Logistics (HICL), in: Kersten, Wolfgang & Blecker, Thorsten & Ringle, Christian M. (ed.), Data Science and Innovation in Supply Chain Management: How Data Transforms the Value Chain. Proceedings of the Hamburg International Conference of Lo, volume 29, pages 395-425, Hamburg University of Technology (TUHH), Institute of Business Logistics and General Management.
    5. Besiki Stvilia & Les Gasser & Michael B. Twidale & Linda C. Smith, 2007. "A framework for information quality assessment," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(12), pages 1720-1733, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shuchih Ernest Chang & Hueimin Louis Luo & YiChian Chen, 2019. "Blockchain-Enabled Trade Finance Innovation: A Potential Paradigm Shift on Using Letter of Credit," Sustainability, MDPI, vol. 12(1), pages 1-16, December.
    2. Nicolas Jullien, 2012. "What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s)," Post-Print hal-00857208, HAL.
    3. Damian Przekop, 2020. "Feature Engineering for Anti-Fraud Models Based on Anomaly Detection," Central European Journal of Economic Modelling and Econometrics, Central European Journal of Economic Modelling and Econometrics, vol. 12(3), pages 301-316, September.
    4. Francesca Ieva & Anna Maria Paganoni, 2020. "Component-wise outlier detection methods for robustifying multivariate functional samples," Statistical Papers, Springer, vol. 61(2), pages 595-614, April.
    5. Gaucher, Solenne & Klopp, Olga & Robin, Geneviève, 2021. "Outlier detection in networks with missing links," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
    6. Andrzej Chmielowiec, 2021. "Algorithm for error-free determination of the variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data," Computational Statistics, Springer, vol. 36(4), pages 2813-2840, December.
    7. Marc Chataigner & Stéphane Crépey & Jiang Pu, 2020. "Nowcasting Networks," Post-Print hal-03910123, HAL.
    8. Greco, Salvatore & Ishizaka, Alessio & Tasiou, Menelaos & Torrisi, Gianpiero, 2019. "Sigma-Mu efficiency analysis: A methodology for evaluating units through composite indicators," European Journal of Operational Research, Elsevier, vol. 278(3), pages 942-960.
    9. David Juárez-Varón & Victoria Tur-Viñes & Alejandro Rabasa-Dolado & Kristina Polotskaya, 2020. "An Adaptive Machine Learning Methodology Applied to Neuromarketing Analysis: Prediction of Consumer Behaviour Regarding the Key Elements of the Packaging Design of an Educational Toy," Social Sciences, MDPI, vol. 9(9), pages 1-23, September.
    10. Stéphane Crépey & Lehdili Noureddine & Nisrine Madhar & Maud Thomas, 2022. "Anomaly Detection on Financial Time Series by Principal Component Analysis and Neural Networks," Working Papers hal-03777995, HAL.
    11. Zhongqiu Wang & Guan Yuan & Haoran Pei & Yanmei Zhang & Xiao Liu, 2020. "Unsupervised learning trajectory anomaly detection algorithm based on deep representation," International Journal of Distributed Sensor Networks, , vol. 16(12), pages 15501477209, December.
    12. Maria Richert & Marek Dudek, 2023. "Risk Mapping: Ranking and Analysis of Selected, Key Risk in Supply Chains," JRFM, MDPI, vol. 16(2), pages 1-30, January.
    13. Carling, Kenneth, 1998. "Resistant outlier rules and the non-Gaussian case," Working Paper Series 2001:7, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    14. Arata, Linda & Fabrizi, Enrico & Sckokai, Paolo, 2020. "A worldwide analysis of trend in crop yields and yield variability: Evidence from FAO data," Economic Modelling, Elsevier, vol. 90(C), pages 190-208.
    15. Wentao Yang & Huaxi He & Dongsheng Wei & Hao Chen, 2022. "Generating pseudo-absence samples of invasive species based on outlier detection in the geographical characteristic space," Journal of Geographical Systems, Springer, vol. 24(2), pages 261-279, April.
    16. Liu, Ling & Henley, John & Mousavi, Mohammad Mahdi, 2021. "Foreign interfirm networks and internationalization: Evidence from sub-Saharan Africa," Journal of International Management, Elsevier, vol. 27(1).
    17. Nir Kshetri, 2023. "Blockchain’s Role in Enhancing Quality and Safety and Promoting Sustainability in the Food and Beverage Industry," Sustainability, MDPI, vol. 15(23), pages 1-23, November.
    18. Fournier, Nicholas PhD & Farid, Yashar Zeinali PhD & Patire, Anthony David PhD, 2021. "Potential Erroneous Degradation of High Occupancy Vehicle (HOV) Facilities," Institute of Transportation Studies, Research Reports, Working Papers, Proceedings qt3z76r7tj, Institute of Transportation Studies, UC Berkeley.
    19. Richter, Lucas & Lehna, Malte & Marchand, Sophie & Scholz, Christoph & Dreher, Alexander & Klaiber, Stefan & Lenk, Steve, 2022. "Artificial Intelligence for Electricity Supply Chain automation," Renewable and Sustainable Energy Reviews, Elsevier, vol. 163(C).
    20. Tommaso Barbariol & Enrico Feltresi & Gian Antonio Susto, 2020. "Self-Diagnosis of Multiphase Flow Meters through Machine Learning-Based Anomaly Detection," Energies, MDPI, vol. 13(12), pages 1-24, June.

    More about this item

    Keywords

    Advanced Manufacturing; Industry 4.0;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:hiclch:267183. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://hicl.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.