IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v12y2025i1d10.1057_s41599-025-04503-w.html
   My bibliography  Save this article

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

Author

Listed:
  • Andres Karjus

    (Tallinn University
    Estonian Business School)

Abstract

The increasing capacities of large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, by automating complex qualitative tasks otherwise typically carried out by human researchers. While numerous benchmarking studies have assessed the analytic prowess of LLMs, there is less focus on operationalizing this capacity for inference and hypothesis testing. Addressing this challenge, a systematic framework is argued for here, building on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. Replicability and statistical robustness are discussed, including how to incorporate machine annotator error rates in subsequent inference. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering nine diverse languages, multiple disciplines and tasks, including analysis of themes, stances, ideas, and genre compositions; linguistic and semantic annotation, interviews, text mining and event cause inference in noisy historical data, literary social network construction, metadata imputation, and multimodal visual cultural analytics. Using hypothesis-driven topic classification instead of “distant reading” is discussed. The replications among the experiments also illustrate how tasks previously requiring protracted team effort or complex computational pipelines can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, the approach is not intended to replace, but to augment and scale researcher expertise and analytic practices. With these opportunities in sight, qualitative skills and the ability to pose insightful questions have arguably never been more critical.

Suggested Citation

  • Andres Karjus, 2025. "Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence," Palgrave Communications, Palgrave Macmillan, vol. 12(1), pages 1-18, December.
  • Handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-04503-w
    DOI: 10.1057/s41599-025-04503-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-025-04503-w
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-025-04503-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Brady D. Lund & Ting Wang & Nishith Reddy Mannuru & Bing Nie & Somipam Shimray & Ziang Wang, 2023. "ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(5), pages 570-581, May.
    2. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    3. Ash, Elliott & Gauthier, Germain & Widmer, Philine, 2024. "Relatio: Text Semantics Capture Political and Economic Narratives," Political Analysis, Cambridge University Press, vol. 32(1), pages 115-132, January.
    4. Ekaterina Novozhilova & Kate Mays & James E. Katz, 2024. "Looking towards an automated future: U.S. attitudes towards future artificial intelligence instantiations and their effect," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
    5. Robert Lew, 2023. "ChatGPT as a COBUILD lexicographer," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-10, December.
    6. Wichmann, Søren & Holman, Eric W. & Bakker, Dik & Brown, Cecil H., 2010. "Evaluating linguistic distance measures," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(17), pages 3632-3639.
    7. Mikaela Irene Fudolig & Thayer Alshaabi & Kathryn Cramer & Christopher M. Danforth & Peter Sheridan Dodds, 2023. "A decomposition of book structure through ousiometric fluctuations in cumulative word-time," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-12, December.
    8. Lisa Messeri & M. J. Crockett, 2024. "Artificial intelligence and illusions of understanding in scientific research," Nature, Nature, vol. 627(8002), pages 49-58, March.
    9. Francisco Banha & Adão Flores & Luís Serra Coelho, 2022. "Quantitizing Qualitative Data from Semi-Structured Interviews: A Methodological Contribution in the Context of Public Policy Decision-Making," Mathematics, MDPI, vol. 10(19), pages 1-30, October.
    10. Yan Asadchy & Andres Karjus & Ksenia Mukhina & Maximilian Schich, 2024. "Perceived gendered self-representation on Tinder using machine learning," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
    11. Mila Oiva & Ksenia Mukhina & Vejune Zemaityte & Andres Karjus & Mikhail Tamm & Tillmann Ohm & Mark Mets & Daniel Chávez Heras & Mar Canet Sola & Helena Hanna Juht & Maximilian Schich, 2024. "A framework for the analysis of historical newsreels," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-15, December.
    12. Andres Karjus & Christine Cuskley, 2024. "Evolving linguistic divergence on polarizing social media," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-14, December.
    13. Indira Sen & Daniele Quercia & Licia Capra & Matteo Montecchi & Sanja Šćepanović, 2023. "Insider stories: analyzing internal sustainability efforts of major US companies from online reviews," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-9, December.
    14. Taylor Webb & Keith J. Holyoak & Hongjing Lu, 2023. "Emergent analogical reasoning in large language models," Nature Human Behaviour, Nature, vol. 7(9), pages 1526-1541, September.
    15. Claire Norman & Josephine M. Wildman & Sarah Sowden, 2021. "COVID-19 at the Deep End: A Qualitative Interview Study of Primary Care Staff Working in the Most Deprived Areas of England during the COVID-19 Pandemic," IJERPH, MDPI, vol. 18(16), pages 1-12, August.
    16. Robert Lew, 2024. "Dictionaries and lexicography in the AI era," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-8, December.
    17. Ash, Elliott & Gauthier, Germain & Widmer, Philine, 2024. "Relatio: Text Semantics Capture Political and Economic Narratives," Political Analysis, Cambridge University Press, vol. 32(1), pages 115-132, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rauh, Christian & Parizek, Michal, 2024. "Converging on Europe? The European Union in mediatised debates during the COVID-19 and Ukraine shocks," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 31(10), pages 3036-3065.
    2. Felix Drinkall & Stefan Zohren & Michael McMahon & Janet B. Pierrehumbert, 2025. "Stories that (are) Move(d by) Markets: A Causal Exploration of Market Shocks and Semantic Shifts across Different Partisan Groups," Papers 2502.14497, arXiv.org.
    3. Baszczak, Łukasz, . "Ekonomia narracji – początki nowego nurtu," Gospodarka Narodowa-The Polish Journal of Economics, Szkoła Główna Handlowa w Warszawie / SGH Warsaw School of Economics, vol. 2023(1).
    4. Łukasz Baszczak, 2023. "Ekonomia narracji – początki nowego nurtu," Gospodarka Narodowa. The Polish Journal of Economics, Warsaw School of Economics, issue 1, pages 66-81.
    5. Robert Lew & Bartosz Ptasznik & Sascha Wolfer, 2024. "The effectiveness of ChatGPT as a lexical tool for English, compared with a bilingual dictionary and a monolingual learner’s dictionary," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    6. Motoki, Fabio Y.S. & Pinho Neto, Valdemar & Rangel, Victor, 2025. "Assessing political bias and value misalignment in generative artificial intelligence," Journal of Economic Behavior & Organization, Elsevier, vol. 234(C).
    7. Kai Gehring & Matteo Grigoletto, 2025. "Virality: What Makes Narratives Go Viral, and Does it Matter," CESifo Working Paper Series 12064, CESifo.
    8. Francesco Bilotta & Alberto Binetti & Giacomo Manferdini, 2025. "Blameocracy: Causal Attribution in Political Communication," Papers 2504.06550, arXiv.org, revised Jun 2025.
    9. Jian-Qiao Zhu & Haijiang Yan & Thomas L. Griffiths, 2024. "Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice," Papers 2405.19313, arXiv.org, revised May 2025.
    10. Miriam Aparicio, 2021. "Resiliency and Cooperation or Regarding Social and Collective Competencies for University Achievement. An Analysis from a Systemic Perspective," European Journal of Social Sciences Education and Research Articles, Revistia Research and Publishing, vol. 8, ejser_v8_.
    11. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    12. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    13. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    14. F. Marta L. Di Lascio & Andrea Menapace & Roberta Pappadà, 2024. "A spatially‐weighted AMH copula‐based dissimilarity measure for clustering variables: An application to urban thermal efficiency," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    15. Yifan Zhu & Chongzhi Di & Ying Qing Chen, 2019. "Clustering Functional Data with Application to Electronic Medication Adherence Monitoring in HIV Prevention Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 11(2), pages 238-261, July.
    16. Irene Vrbik & Paul McNicholas, 2015. "Fractionally-Supervised Classification," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 359-381, October.
    17. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    18. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    19. Patrick D. Shay & Stephen S. Farnsworth Mick, 2017. "Clustered and distinct: a taxonomy of local multihospital systems," Health Care Management Science, Springer, vol. 20(3), pages 303-315, September.
    20. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-04503-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.