IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/vyid10.1007_s10796-016-9680-8.html

Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques

Author

Listed:
  • Sebastian Büsch

    (Ilmenau University of Technology)

  • Volker Nissen

    (Ilmenau University of Technology)

  • Arndt Wünscher

    (Ilmenau University of Technology)

Abstract

The aim of Information Lifecycle Management (ILM) is to govern data throughout its lifecycle as efficiently as possible and effectively from technical points of view. A core aspect is the question, where the data should be stored, since different costs and access times are entailed. For this purpose data have to be classified, which presently is either done manually in an elaborate way, or with recourse to only a few data attributes, in particular access frequency. In the context of Data-Warehouse-Systems this article introduces an automated and therefore speedy and cost-effective data classification for ILM. Machine learning techniques, in particular an artificial neural network (multilayer perceptron), a support vector machine and a decision tree approach are compared on an SAP-based real-world data set from the automotive industry. This data classification considers a large number of data attributes and thus attains similar results akin to human experts. In this comparison of machine learning techniques, besides the accuracy of classification, also the types of misclassification that appear, are included, since this is important in ILM.

Suggested Citation

  • Sebastian Büsch & Volker Nissen & Arndt Wünscher, 0. "Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques," Information Systems Frontiers, Springer, vol. 0, pages 1-15.
  • Handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-016-9680-8
    DOI: 10.1007/s10796-016-9680-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-016-9680-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-016-9680-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. David L. Olson & Dursun Delen, 2008. "Advanced Data Mining Techniques," Springer Books, Springer, number 978-3-540-76917-0, January.
    2. Hasso Plattner & Alexander Zeier, 2011. "In-Memory Data Management," Springer Books, Springer, number 978-3-642-19363-7, January.
    3. Markus Lilienthal, 2013. "A Decision Support Model for Cloud Bursting," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 5(2), pages 71-81, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vijayan Sugumaran & T. V. Geetha & D. Manjula & Hema Gopal, 2017. "Guest Editorial: Computational Intelligence and Applications," Information Systems Frontiers, Springer, vol. 19(5), pages 969-974, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sebastian Büsch & Volker Nissen & Arndt Wünscher, 2017. "Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques," Information Systems Frontiers, Springer, vol. 19(5), pages 1085-1099, October.
    2. Tobias Knabke & Sebastian Olbrich, 2018. "Building novel capabilities to enable business intelligence agility: results from a quantitative study," Information Systems and e-Business Management, Springer, vol. 16(3), pages 493-546, August.
    3. Vangelis Marinakis & Themistoklis Koutsellis & Alexandros Nikas & Haris Doukas, 2021. "AI and Data Democratisation for Intelligent Energy Management," Energies, MDPI, vol. 14(14), pages 1-14, July.
    4. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    5. HimaJyothi Kasaraneni & Salini Rosaline, 2024. "Automatic Merging of Scopus and Web of Science Data for Simplified and Effective Bibliometric Analysis," Annals of Data Science, Springer, vol. 11(3), pages 785-802, June.
    6. Emrouznejad, Ali & De Witte, Kristof, 2010. "COOPER-framework: A unified process for non-parametric projects," European Journal of Operational Research, Elsevier, vol. 207(3), pages 1573-1586, December.
    7. Marina Johnson & Abdullah Albizri & Serhat Simsek, 2022. "Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis," Annals of Operations Research, Springer, vol. 308(1), pages 275-305, January.
    8. Mehri, Ali & Darooneh, Amir H. & Shariati, Ashrafalsadat, 2012. "The complex networks approach for authorship attribution of books," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(7), pages 2429-2437.
    9. Javier Gomez & Cesar Alfaro & Felipe Ortega & Javier M. Moguerza & Maria Jesus Algar & Raul Moreno, 2024. "Adapting support vector optimisation algorithms to textual gender classification," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(3), pages 463-488, October.
    10. Michał Jasiński & Tomasz Sikorski & Zbigniew Leonowicz & Klaudiusz Borkowski & Elżbieta Jasińska, 2020. "The Application of Hierarchical Clustering to Power Quality Measurements in an Electrical Power Network with Distributed Generation," Energies, MDPI, vol. 13(9), pages 1-19, May.
    11. Radka Nacheva & Maciej Czaplewski & Pavel Petrov, 2024. "Data mining model for scientific research classification: the case of digital workplace accessibility," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 51(1), pages 3-16, March.
    12. Zhongxing Peng & Wei Huang & Yinghui Zhu, 2025. "Feedforward Factorial Hidden Markov Model," Mathematics, MDPI, vol. 13(7), pages 1-20, April.
    13. Beni Rohrbach & Sharolyn Anderson & Patrick Laube, 2016. "The effects of sample size on data quality in participatory mapping of past land use," Environment and Planning B, , vol. 43(4), pages 681-697, July.
    14. César Alfaro & Javier Cano-Montero & Javier Gómez & Javier M. Moguerza & Felipe Ortega, 2016. "A multi-stage method for content classification and opinion mining on weblog comments," Annals of Operations Research, Springer, vol. 236(1), pages 197-213, January.
    15. M. J. Diamantopoulou & A. Georgakis & M. Progios, 2025. "Optimizing pine tree stem volume models using artificial neural networks with minimal input variables," Operational Research, Springer, vol. 25(2), pages 1-23, June.
    16. Derya Ozturk & Nergiz Uzel-Gunini, 2022. "Investigation of the effects of hybrid modeling approaches, factor standardization, and categorical mapping on the performance of landslide susceptibility mapping in Van, Turkey," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 114(3), pages 2571-2604, December.
    17. Shaheen, Muhammad & Khan, Muhammad Zeb, 2016. "A method of data mining for selection of site for wind turbines," Renewable and Sustainable Energy Reviews, Elsevier, vol. 55(C), pages 1225-1233.
    18. Robert Keller & Lukas Häfner & Thomas Sachs & Gilbert Fridgen, 2020. "Scheduling Flexible Demand in Cloud Computing Spot Markets," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 62(1), pages 25-39, February.
    19. Peter Loos & Jens Lechtenbörger & Gottfried Vossen & Alexander Zeier & Jens Krüger & Jürgen Müller & Wolfgang Lehner & Donald Kossmann & Benjamin Fabian & Oliver Günther & Robert Winter, 2011. "In-memory Databases in Business Information Systems," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 3(6), pages 389-395, December.
    20. Simsek, Serhat & Dag, Ali & Tiahrt, Thomas & Oztekin, Asil, 2021. "A Bayesian Belief Network-based probabilistic mechanism to determine patient no-show risk categories," Omega, Elsevier, vol. 100(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v::y::i::d:10.1007_s10796-016-9680-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.