IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v12y2025i1d10.1057_s41599-025-04503-w.html
   My bibliography  Save this article

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

Author

Listed:
  • Andres Karjus

    (Tallinn University
    Estonian Business School)

Abstract

The increasing capacities of large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, by automating complex qualitative tasks otherwise typically carried out by human researchers. While numerous benchmarking studies have assessed the analytic prowess of LLMs, there is less focus on operationalizing this capacity for inference and hypothesis testing. Addressing this challenge, a systematic framework is argued for here, building on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. Replicability and statistical robustness are discussed, including how to incorporate machine annotator error rates in subsequent inference. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering nine diverse languages, multiple disciplines and tasks, including analysis of themes, stances, ideas, and genre compositions; linguistic and semantic annotation, interviews, text mining and event cause inference in noisy historical data, literary social network construction, metadata imputation, and multimodal visual cultural analytics. Using hypothesis-driven topic classification instead of “distant reading” is discussed. The replications among the experiments also illustrate how tasks previously requiring protracted team effort or complex computational pipelines can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, the approach is not intended to replace, but to augment and scale researcher expertise and analytic practices. With these opportunities in sight, qualitative skills and the ability to pose insightful questions have arguably never been more critical.

Suggested Citation

  • Andres Karjus, 2025. "Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence," Palgrave Communications, Palgrave Macmillan, vol. 12(1), pages 1-18, December.
  • Handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-04503-w
    DOI: 10.1057/s41599-025-04503-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-025-04503-w
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-025-04503-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lisa Messeri & M. J. Crockett, 2024. "Artificial intelligence and illusions of understanding in scientific research," Nature, Nature, vol. 627(8002), pages 49-58, March.
    2. Francisco Banha & Adão Flores & Luís Serra Coelho, 2022. "Quantitizing Qualitative Data from Semi-Structured Interviews: A Methodological Contribution in the Context of Public Policy Decision-Making," Mathematics, MDPI, vol. 10(19), pages 1-30, October.
    3. Taylor Webb & Keith J. Holyoak & Hongjing Lu, 2023. "Emergent analogical reasoning in large language models," Nature Human Behaviour, Nature, vol. 7(9), pages 1526-1541, September.
    4. Claire Norman & Josephine M. Wildman & Sarah Sowden, 2021. "COVID-19 at the Deep End: A Qualitative Interview Study of Primary Care Staff Working in the Most Deprived Areas of England during the COVID-19 Pandemic," IJERPH, MDPI, vol. 18(16), pages 1-12, August.
    5. Ash, Elliott & Gauthier, Germain & Widmer, Philine, 2024. "Relatio: Text Semantics Capture Political and Economic Narratives," Political Analysis, Cambridge University Press, vol. 32(1), pages 115-132, January.
    6. Brady D. Lund & Ting Wang & Nishith Reddy Mannuru & Bing Nie & Somipam Shimray & Ziang Wang, 2023. "ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(5), pages 570-581, May.
    7. Ekaterina Novozhilova & Kate Mays & James E. Katz, 2024. "Looking towards an automated future: U.S. attitudes towards future artificial intelligence instantiations and their effect," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
    8. Mikaela Irene Fudolig & Thayer Alshaabi & Kathryn Cramer & Christopher M. Danforth & Peter Sheridan Dodds, 2023. "A decomposition of book structure through ousiometric fluctuations in cumulative word-time," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-12, December.
    9. Mila Oiva & Ksenia Mukhina & Vejune Zemaityte & Andres Karjus & Mikhail Tamm & Tillmann Ohm & Mark Mets & Daniel Chávez Heras & Mar Canet Sola & Helena Hanna Juht & Maximilian Schich, 2024. "A framework for the analysis of historical newsreels," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-15, December.
    10. Andres Karjus & Christine Cuskley, 2024. "Evolving linguistic divergence on polarizing social media," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-14, December.
    11. Robert Lew, 2023. "ChatGPT as a COBUILD lexicographer," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-10, December.
    12. Yan Asadchy & Andres Karjus & Ksenia Mukhina & Maximilian Schich, 2024. "Perceived gendered self-representation on Tinder using machine learning," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-11, December.
    13. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    14. Ash, Elliott & Gauthier, Germain & Widmer, Philine, 2024. "Relatio: Text Semantics Capture Political and Economic Narratives – ERRATUM," Political Analysis, Cambridge University Press, vol. 32(1), pages 156-156, January.
    15. Wichmann, Søren & Holman, Eric W. & Bakker, Dik & Brown, Cecil H., 2010. "Evaluating linguistic distance measures," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(17), pages 3632-3639.
    16. Indira Sen & Daniele Quercia & Licia Capra & Matteo Montecchi & Sanja Šćepanović, 2023. "Insider stories: analyzing internal sustainability efforts of major US companies from online reviews," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-9, December.
    17. Robert Lew, 2024. "Dictionaries and lexicography in the AI era," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-8, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Felix Drinkall & Stefan Zohren & Michael McMahon & Janet B. Pierrehumbert, 2025. "Stories that (are) Move(d by) Markets: A Causal Exploration of Market Shocks and Semantic Shifts across Different Partisan Groups," Papers 2502.14497, arXiv.org.
    2. Robert Lew & Bartosz Ptasznik & Sascha Wolfer, 2024. "The effectiveness of ChatGPT as a lexical tool for English, compared with a bilingual dictionary and a monolingual learner’s dictionary," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    3. Jian-Qiao Zhu & Haijiang Yan & Thomas L. Griffiths, 2024. "Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice," Papers 2405.19313, arXiv.org.
    4. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    5. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    6. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    7. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    8. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    9. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).
    10. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    11. A van Giessen & K G M Moons & G A de Wit & W M M Verschuren & J M A Boer & H Koffijberg, 2015. "Tailoring the Implementation of New Biomarkers Based on Their Added Predictive Value in Subgroups of Individuals," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-14, January.
    12. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    13. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    14. Elvira Pelle & Roberta Pappadà, 2021. "A clustering procedure for mixed-type data to explore ego network typologies: an application to elderly people living alone in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(5), pages 1507-1533, December.
    15. Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
    16. Tom Wilderjans & Eva Ceulemans & Iven Mechelen, 2008. "The CHIC Model: A Global Model for Coupled Binary Data," Psychometrika, Springer;The Psychometric Society, vol. 73(4), pages 729-751, December.
    17. Dong Liu & Changwei Zhao & Yong He & Lei Liu & Ying Guo & Xinsheng Zhang, 2023. "Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix‐variate fMRI data," Biometrics, The International Biometric Society, vol. 79(3), pages 2246-2259, September.
    18. Yuchen Liang & Guowei Shi & Runlin Cai & Yuchen Yuan & Ziying Xie & Long Yu & Yingjian Huang & Qian Shi & Lizhe Wang & Jun Li & Zhonghui Tang, 2024. "PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    19. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    20. Marek Obrębalski & Marek Walesiak, 2015. "Functional structure of Polish regions in the period 2004-2013 – measurement via HHI Index, Florence’s coefficient of localization and cluster analysis," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(2), pages 223-242, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-04503-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.