IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/85kyd_v1.html

High Agreement, Different Stories: How LLM Classifiers Reshape Demographic Patterns in Survey Data

Author

Listed:
  • Soria, Chris

Abstract

What we learn from open-ended survey data depends on who—or what—does the coding. Large Language Models (LLMs) promise to democratize qualitative analysis, but do high agreement rates translate into equivalent thematic findings? This study compares eight LLMs to human annotators on a multilabel coding task using 3,200 responses from the UC Berkeley Social Networks Study, comprising over 19,000 coding decisions. Although LLM-human reliability does not match human-human reliability overall, LLMs approach human performance on simpler tasks and can serve as useful additional coders for generating consensus labels. Compared to a gold-standard human consensus, models achieve 82–97% per-category agreement, but macro F1 is lower and response-level similarity is lower still: even the best model reproduces the full human label set for fewer than 60% of responses. Yet high agreement masks thematic divergence. Models systematically over-identify themes, assigning 67% more categories per response, especially for categories requiring greater interpretive judgment. Models also show lower agreement for some demographic groups. These gaps are partly explained by response characteristics such as length, clarity, and atypicality, and some persist after controls, with implications for studies of populations whose response styles diverge from the corpus average. At the sample level, models largely preserve the overall thematic narrative: human and model category rankings correlate strongly (pooled Spearman's ρ=0.75), and top-performing models achieve approximately 80% directional agreement on demographic patterns. Concrete behavioral questions, such as reasons for moving or strategies for making friends, show especially strong alignment. Yet systematic over-classification can still shift narratives about how specific groups behave, leading researchers to report patterns that the human gold standard does not support.

Suggested Citation

  • Soria, Chris, 2026. "High Agreement, Different Stories: How LLM Classifiers Reshape Demographic Patterns in Survey Data," SocArXiv 85kyd_v1, Center for Open Science.
  • Handle: RePEc:osf:socarx:85kyd_v1
    DOI: 10.31219/osf.io/85kyd_v1
    as

    Download full text from publisher

    File URL: https://osf.io/download/6a1f995f75da36c10dffb87e/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/85kyd_v1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:85kyd_v1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.