IDEAS home Printed from https://ideas.repec.org/a/inm/ormksc/v44y2025i6p1446-1455.html

Database Report: Twin-2K-500: A Data Set for Building Digital Twins of over 2,000 People Based on Their Answers to over 500 Questions

Author

Listed:
  • Olivier Toubia

    (Marketing Division, Columbia Business School, Columbia University, New York, New York 10027)

  • George Z. Gui

    (Marketing Division, Columbia Business School, Columbia University, New York, New York 10027)

  • Tianyi Peng

    (Decision, Risk & Operations Division, Columbia Business School, Columbia University, New York, New York 10027)

  • Daniel J. Merlau

    (Marketing Division, Columbia Business School, Columbia University, New York, New York 10027)

  • Ang Li

    (Department of Computer Science, Columbia University, New York, New York 10025)

  • Haozhe Chen

    (Department of Computer Science, Columbia University, New York, New York 10025)

Abstract

Large language model (LLM)-based digital twin simulation, where LLMs are used to emulate individual human behavior, holds great promise for research in business, artificial intelligence, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real individual-level data sets that are both large and publicly available. To address this gap, we introduce a large-scale public data set designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of N = 2 , 058 participants (average 2.42 hours per person) in the United States across four waves with more than 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. Beyond LLM applications, due to its unique breadth and scale, the data set also enables broad social science and business research, including studies of cross-construct correlations and heterogeneous treatment effects.

Suggested Citation

  • Olivier Toubia & George Z. Gui & Tianyi Peng & Daniel J. Merlau & Ang Li & Haozhe Chen, 2025. "Database Report: Twin-2K-500: A Data Set for Building Digital Twins of over 2,000 People Based on Their Answers to over 500 Questions," Marketing Science, INFORMS, vol. 44(6), pages 1446-1455, November.
  • Handle: RePEc:inm:ormksc:v:44:y:2025:i:6:p:1446-1455
    DOI: 10.1287/mksc.2025.0262
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mksc.2025.0262
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mksc.2025.0262?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gergana Y. Nenkov & Maureen Morrin & Andrew Ward & Barry Schwartz & John Hulland, 2008. "A short form of the Maximization Scale: Factor structure, reliability and validity studies," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 3, pages 371-388, June.
    2. Scott I. Rick & Cynthia E. Cryder & George Loewenstein, 2008. "Tightwads and Spendthrifts," Journal of Consumer Research, Journal of Consumer Research Inc., vol. 34(6), pages 767-782, October.
    3. Argyle, Lisa P. & Busby, Ethan C. & Fulda, Nancy & Gubler, Joshua R. & Rytting, Christopher & Wingate, David, 2023. "Out of One, Many: Using Language Models to Simulate Human Samples," Political Analysis, Cambridge University Press, vol. 31(3), pages 337-351, July.
    4. John J. Horton & Apostolos Filippas & Benjamin S. Manning, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," NBER Working Papers 31122, National Bureau of Economic Research, Inc.
    5. Richard H. Thaler, 2008. "Mental Accounting and Consumer Choice," Marketing Science, INFORMS, vol. 27(1), pages 15-25, 01-02.
    6. Fabio Motoki & Valdemar Pinho Neto & Victor Rodrigues, 2024. "More human than human: measuring ChatGPT political bias," Public Choice, Springer, vol. 198(1), pages 3-23, January.
    7. Nenkov, Gergana Y. & Morrin, Maureen & Ward, Andrew & Schwartz, Barry & Hulland, John, 2008. "A short form of the Maximization Scale: Factor structure, reliability and validity studies," Judgment and Decision Making, Cambridge University Press, vol. 3(5), pages 371-388, June.
    8. Guth, Werner & Schmittberger, Rolf & Schwarze, Bernd, 1982. "An experimental analysis of ultimatum bargaining," Journal of Economic Behavior & Organization, Elsevier, vol. 3(4), pages 367-388, December.
    9. Eric J Johnson & Stephan Meier & Olivier Toubia, 2019. "What’s the Catch? Suspicion of Bank Motives and Sluggish Refinancing," The Review of Financial Studies, Society for Financial Studies, vol. 32(2), pages 467-495.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Olivier Toubia & George Z. Gui & Tianyi Peng & Daniel J. Merlau & Ang Li & Haozhe Chen, 2025. "Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions," Papers 2505.17479, arXiv.org.
    2. George Gui & Seungwoo Kim, 2025. "Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach," Papers 2509.25709, arXiv.org.
    3. Antonina Rafikova & Anatoly Voronin, 2026. "ChatGPT as a research proxy: simulating human attitudes in social science research," Journal of Computational Social Science, Springer, vol. 9(1), pages 1-30, February.
    4. Ferraz, Vinícius & Olah, Tamas & Sazedul, Ratin & Schmidt, Robert & Schwieren, Christiane, 2025. "When Artificial Minds Negotiate: Dark Personality and the Ultimatum Game in Large Language Models," Working Papers 0768, University of Heidelberg, Department of Economics.
    5. Yingnan Yan & Tianming Liu & Yafeng Yin, 2025. "Valuing Time in Silicon: Can Large Language Models Replicate Human Value of Travel Time," Papers 2507.22244, arXiv.org, revised Dec 2025.
    6. Yizhao Jiang, 2022. "The Influence of Payment Method: Do Consumers Pay More with Mobile Payment?," Papers 2210.14631, arXiv.org.
    7. Andreas Leibbrandt, 2016. "Behavioral Constraints on Pricing: Experimental Evidence on Price Discrimination and Customer Antagonism," CESifo Working Paper Series 6214, CESifo.
    8. Felipe A. Csaszar & Harsh Ketkar & Hyunjin Kim, 2024. "Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors," Strategy Science, INFORMS, vol. 9(4), pages 322-345, December.
    9. Thomas Wagner, 1998. "Reciprocity And Efficiency," Rationality and Society, , vol. 10(3), pages 347-375, August.
    10. Ho, Cony Ming-Shen & Chin, Shih-Chun (Daniel) & Wang, TzuShuo Ryan, 2025. "Recurring versus one-time donation requests: The toll on attracting donors," Journal of Business Research, Elsevier, vol. 192(C).
    11. Francisco Gomes & Michael Haliassos & Tarun Ramadorai, 2021. "Household Finance," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 919-1000, September.
    12. Hongshen Sun & Juanjuan Zhang, 2025. "From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research," Papers 2512.23184, arXiv.org.
    13. Wen, Tong & Leung, Xi Y. & Li, Bin & Hu, Lingyan, 2021. "Examining framing effect in travel package purchase: An application of double-entry mental accounting theory," Annals of Tourism Research, Elsevier, vol. 90(C).
    14. Mohd Rizuan Abdul Kadir & Abdul rahim saleh Ibrahim AlBalushi & Sarfaraz Javed, 2025. "Institutional Pressure and Business Sustainable Performance: Does Environmental Management Accounting Matter?," IIM Kozhikode Society & Management Review, , vol. 14(2), pages 207-219, July.
    15. Koji Takahashi & Joon Suk Park, 2025. "Generative AI for Surveys on Payment Apps: AIs' View on Privacy and Technology," IMES Discussion Paper Series 25-E-13, Institute for Monetary and Economic Studies, Bank of Japan.
    16. Hui Chen & Antoine Didisheim & Mohammad & Pourmohammadi & Luciano Somoza & Hanqing Tian, 2025. "A Financial Brain Scan of the LLM," Papers 2508.21285, arXiv.org, revised Feb 2026.
    17. Christopher Funke & Caroline Rothert-Schnell & Gianfranco Walsh & Federico Mangiò & Giuseppe Pedeliento & Ikuo Takahashi, 2026. "The digital stress scale: cross-cultural application, validation, and development of a short scale," Review of Managerial Science, Springer, vol. 20(4), pages 1229-1265, April.
    18. repec:osf:osfxxx:r3qng_v1 is not listed on IDEAS
    19. Dayana Zhappassova & Ben Gilbert & Linda Thunstrom, 2018. "Energy efficiency, green technology and the pain of paying," Working Papers 2018-03, Colorado School of Mines, Division of Economics and Business.
    20. Dewitte, Siegfried, 2013. "From willpower breakdown to the breakdown of the willpower model – The symmetry of self-control and impulsive behavior," Journal of Economic Psychology, Elsevier, vol. 38(C), pages 16-25.
    21. Paul, Maureen, 2006. "A cross-section analysis of the fairness-of-pay perception of UK employees," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 35(2), pages 243-267, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormksc:v:44:y:2025:i:6:p:1446-1455. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.