IDEAS home Printed from https://ideas.repec.org/b/rsw/rswout/7-2en.html

Collection and use of unstructured data in the social, behavioural, and economic sciences

Editor

Listed:
  • German Data Forum RatSWD

Abstract

The increasing digital transformation of society in recent decades has resulted in a number of new data sources for the social, behavioural, and economic sciences. Among many others, they include unstructured data, which are characterised by not being available in a fixed data format and are therefore not easy to process for data analysis (e.g., Facebook posts, Instagram images, YouTube videos, Twitter messages). The use of unstructured data is linked to specific challenges, which arise precisely because the data are not typically collected as part of a controlled, scientific study but are often created in people’s natural environments. Building on the results of an expert workshop, we describe the specific challenges of generating and using unstructured data and formulate recommendations for their use. Our recommendations are based on the total error framework and take into account data generation (definition of the units of analysis, coverage and sampling error, non-response, and missing data error), post-collection processing (specification error, validity, measurement error, and error in terms of content), and, lastly, data analysis (record linkage and processing errors, modelling errors, analytical errors). Finally, we discuss open questions and challenges to research using unstructured data.

Suggested Citation

  • German Data Forum RatSWD (ed.), 2024. "Collection and use of unstructured data in the social, behavioural, and economic sciences," RatSWD Output Series, German Data Forum (RatSWD), volume 7, number 7-2en.
  • Handle: RePEc:rsw:rswout:7-2en
    DOI: https://doi.org/10.17620/02671.92
    as

    Download full text from publisher

    File URL: https://www.konsortswd.de/wp-content/uploads/RatSWD-Output2.7-Unstructured-Data-2024.pdf
    Download Restriction: no

    File URL: https://libkey.io/https://doi.org/10.17620/02671.92?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Uri Simonsohn & Joseph P. Simmons & Leif D. Nelson, 2020. "Specification curve analysis," Nature Human Behaviour, Nature, vol. 4(11), pages 1208-1214, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:osf:osfxxx:sw6kd_v1 is not listed on IDEAS
    2. Eibich, Peter & Goldzahl, Léontine, 2021. "Does retirement affect secondary preventive care use? Evidence from breast cancer screening," Economics & Human Biology, Elsevier, vol. 43(C).
    3. Gao, Mingze & Leung, Henry & Qiu, Buhui, 2021. "Organization capital and executive performance incentives," Journal of Banking & Finance, Elsevier, vol. 123(C).
    4. Christoph Kronenberg, 2021. "New(spaper) evidence of a reduction in suicide mentions during the 19th century US gold rush," Health Economics, John Wiley & Sons, Ltd., vol. 30(10), pages 2582-2594, September.
    5. Rose, Julian & Neubauer, Florian & Ankel-Peters, Jörg, 2024. "Long-Term Effects of the Targeting the Ultra-Poor Program - A Reproducibility and Replicability Assessment of Banerjee et al. (2021)," I4R Discussion Paper Series 142, The Institute for Replication (I4R).
    6. Rubin, Mark, 2023. "Type I error rates are not usually inflated," MetaArXiv 3kv2b, Center for Open Science.
    7. Clist, Paul, 2025. "The Common Problem of Bad Controls in Tests of the Linguistic Savings Hypothesis. A Comment on Ayres et al. (PNAS, 2023) and related literature," Journal of Comments and Replications in Economics (JCRE), ZBW - Leibniz Information Centre for Economics, vol. 4, pages 1-27.
    8. Jemesa Landers & Tom Coupé & Andrea Menclova, 2026. "The Long-Term Impact of War Victimization on Life Satisfaction - Evidence from the Life in Transition Survey," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 182(1), pages 1-20, March.
    9. Rat für Sozial- und Wirtschaftsdaten RatSWD (ed.), 2023. "Erhebung und Nutzung unstrukturierter Daten in den Sozial-, Verhaltens- und Wirtschaftswissenschaften," RatSWD Output Series, German Data Forum (RatSWD), volume 7, number 7-2de.
    10. Haveresch, Nils & Ankel-Peters, Jörg & Bensch, Gunther, 2025. "A Slippery Slope: Topographic Variation as an Instrument," I4R Discussion Paper Series 278, The Institute for Replication (I4R).
    11. Dorison, Charles A & Lerner, Jennifer S & Heller, Blake H & Rothman, Alexander J & Kawachi, Ichiro I & Wang, Ke & Rees, Vaughan W & Gill, Brian P & Gibbs, Nancy & Ebersole, Charles R & Vally, Zahir & , 2022. "In COVID-19 health messaging, loss framing increases anxiety with little-to-no concomitant benefits : Experimental evidence from 84 countries," Other publications TiSEM 235f67b6-6be5-4061-8693-3, Tilburg University, School of Economics and Management.
    12. Gretton, Jeremy & Roemer, Tobias & Schlüter, Elmar, 2024. "Replication of Hamel & Wilcox-Archuleta (2022): "Black Workers in White Places: Daytime Racial Diversity and White Public Opinion"," I4R Discussion Paper Series 61, The Institute for Replication (I4R), revised 2024.
    13. Mitre-Becerril, David & MacDonald, John M., 2024. "Does urban development influence crime? Evidence from Philadelphia’s new zoning regulations," Journal of Urban Economics, Elsevier, vol. 142(C).
    14. Shunsen Huang & Xiaoxiong Lai & Xinmei Zhao & Xinran Dai & Yuanwei Yao & Cai Zhang & Yun Wang, 2022. "Beyond Screen Time: Exploring the Associations between Types of Smartphone Use Content and Adolescents’ Social Relationships," IJERPH, MDPI, vol. 19(15), pages 1-14, July.
    15. repec:osf:metaar:3kv2b_v1 is not listed on IDEAS
    16. Fieberg, Christian & Günther, Steffen & Poddig, Thorsten & Zaremba, Adam, 2024. "Non-standard errors in the cryptocurrency world," International Review of Financial Analysis, Elsevier, vol. 92(C).
    17. Helmers, Viola & van der Werf, Edwin, 2022. "Did the German Aviation Tax Affect Passenger Numbers? New Evidence Employing Difference-in-differences," VfS Annual Conference 2022 (Basel): Big Data in Economics 264118, Verein für Socialpolitik / German Economic Association.
    18. Claire Duin & Philipp E. Sischka & Andreas Heinz & Helmut Willems, 2025. "The relationship between problematic social media use and health behavior: An exploratory specification curve analysis of large-scale survey data," Applied Research in Quality of Life, Springer;International Society for Quality-of-Life Studies, vol. 20(1), pages 319-345, February.
    19. Tran, Nhan, 2024. "Parents' legal status and children's health insurance: Evidence from DACA," MPRA Paper 120173, University Library of Munich, Germany.
    20. Slichter, David & Tran, Nhan, 2023. "Do better journals publish better estimates?," MPRA Paper 118433, University Library of Munich, Germany.
    21. repec:osf:socarx:tjkcy_v1 is not listed on IDEAS
    22. Karsten Hansen & Kanishka Misra & Robert Evan Sanders, 2024. "Uninformed Choices in Perishables," Marketing Science, INFORMS, vol. 43(4), pages 751-777, July.
    23. Bachler, Sebastian & Erhart, Andrea & Holzknecht, Armando, 2023. "Replication Report on Altmann et al. (2022)," I4R Discussion Paper Series 43, The Institute for Replication (I4R).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:rsw:rswout:7-2en. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: RatSWD (email available below). General contact details of provider: https://edirc.repec.org/data/rtswdde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.