IDEAS home Printed from https://ideas.repec.org/p/iab/iabfob/202513.html
   My bibliography  Save this paper

Text Mining mit der "Temi-Box"

Author

Listed:
  • Hirmer, Christine

    (Institute for Employment Research (IAB), Nuremberg, Germany)

  • Metzger, Lina-Jeanette

    (Institute for Employment Research (IAB), Nuremberg, Germany)

Abstract

"The constantly growing amount of digitally accessible text data and advances in natural language processing (NLP) have made text mining a key technology. The "Temi-Box" is a modular construction kit designed for text mining applications. It enables automated text classification, topic categorization, and clustering without necessitating extensive programming expertise. Developed on the basis of the keywording and topic assignment of publications for the IAB Info Platform and financed by EU funds, it is available as an open source project. This research report documents the development and application of the Temi-Box and illustrates its use and the interpretation of the results obtained. Text mining extracts knowledge from unstructured texts using methods such as classification and clustering. The modular Temi-Box provides users with established methods in a user-friendly way and supports users with a pipeline architecture that simplifies standardised processes such as data preparation and model training. It incorporates both current and traditional approaches to text representation, such as BERT and TF-IDF, and offers a variety of algorithms for text classification and clustering, including K-Nearest Neighbors (KNN), binary and multinomial classifiers as layers in neural networks and K-Means. Various evaluation metrics facilitate the assessment of model performance and the comparison of different approaches. Experiments on automated topic assignment and the identification of key topics illustrate the use of the Temi-Box and the interpretation of the results. Based on a dataset with 1,932 IAB publications and 105 topics, the results show that BERT-based models, such as GermanBERT, consistently achieve the best results. Binary classifiers prove to be particularly flexible and accurate, while TF-IDF-based approaches offer robust alternatives with less complexity. Clustering remains a challenge, especially when content overlaps. The Temi-Box is a highly versatile instrument. In addition to the application for the IAB Info Platform described in this research report, it can be used in numerous areas, such as the analysis of job advertisements, job and company descriptions, keywording of publications or for sentiment analysis. It can also be extended for use in question-and-answer systems or for named entity recognition. The Temi-Box facilitates the application of text mining methods for a broad user base and offers numerous customization options. It reduces the effort involved in developing and comparing models. Its open source availability promotes the further development and integration of the Temi-Box into various research projects. This enables users to adapt the platform to specific needs and integrate new functions. The report shows the potential of the Temi-Box to advance the digitization and automation of text data analysis. At the same time, challenges such as ensuring data quality and the interpretability of the models remain. These aspects require continuous validation and further development in order to further improve the effectiveness and reliability of text mining methods." (Author's abstract, IAB-Doku) ((en))

Suggested Citation

  • Hirmer, Christine & Metzger, Lina-Jeanette, 2025. "Text Mining mit der "Temi-Box"," IAB-Forschungsbericht 202513, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
  • Handle: RePEc:iab:iabfob:202513
    DOI: 10.48720/IAB.FB.2513
    as

    Download full text from publisher

    File URL: https://doi.org/10.48720/IAB.FB.2513
    Download Restriction: no

    File URL: https://libkey.io/10.48720/IAB.FB.2513?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iab:iabfob:202513. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: IAB, Geschäftsbereich Wissenschaftliche Fachinformation und Bibliothek (email available below). General contact details of provider: https://edirc.repec.org/data/iabbbde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.