IDEAS home Printed from https://ideas.repec.org/p/hhs/haechi/2026_002.html

Relational Databases and Machine Learning for Qualitative Big Data

Author

Listed:
  • Lakomaa, Erik

    (Institute for Economic and Business History Research)

  • Friedl, Christoffer

    (Stockholm School of Economics)

Abstract

Recent advances in large-scale digitisation have created new opportunities for economic and business historians who work with substantial bodies of qualitative archival material. Although historical datasets of around 10,000 to 100,000 observations are modest compared to conventional big data, they present similar information processing challenges and make it possible to apply a wide range of machine learning techniques. In this paper, we show how relational databases can provide the necessary infrastructure for preparing, structuring, and analysing large qualitative historical datasets, and how they support the effective use of machine learning tools, including Large Language Models (LLMs). We draw on a research program that has collected more than 114,000 digitised documents from over 30 archives. Our relational database design enables us to structure unstructured sources, standardise metadata, link documents to events and actors, and create longitudinal datasets that can be used for supervised learning, topic modelling, document classification, and embedding-based similarity searches. We also assess the value and limitations of LLMs in historical research. LLMs can accelerate tasks such as document triage, entity recognition, thematic grouping, and preliminary coding. At the same time, they introduce risks related to hallucinations, opaque reasoning processes, and difficulties in tracing the evidentiary basis of outputs. We argue that relational databases reduce these risks by retaining document-level traceability, by making the full set of consulted sources transparent, and by allowing researchers to verify and reinterpret AI-assisted results by saving the epistemological chain of tentative AI suggestions and subsequent researcher validation. Our contribution is an empirically grounded demonstration of how qualitative big data, relational databases, and machine learning methods can be combined to advance economic history, along with a discussion of the safeguards needed to ensure these tools are used responsibly.

Suggested Citation

  • Lakomaa, Erik & Friedl, Christoffer, 2026. "Relational Databases and Machine Learning for Qualitative Big Data," SSE Working Paper Series in Economic History 2026:2, Stockholm School of Economics.
  • Handle: RePEc:hhs:haechi:2026_002
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a
    for a similarly titled item that would be available.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • A00 - General Economics and Teaching - - General - - - General
    • N00 - Economic History - - General - - - General
    • N01 - Economic History - - General - - - Development of the Discipline: Historiographical; Sources and Methods

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hhs:haechi:2026_002. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Erik Lakomaa (email available below). General contact details of provider: https://edirc.repec.org/data/dehhsse.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.