IDEAS home Printed from https://ideas.repec.org/a/nas/journl/v117y2020p8398-8403.html
   My bibliography  Save this article

Measuring the predictability of life outcomes with a scientific mass collaboration

Author

Listed:
  • Matthew J. Salganik

    (Department of Sociology, Princeton University, Princeton, NJ 08544)

  • Ian Lundberg

    (Department of Sociology, Princeton University, Princeton, NJ 08544)

  • Alexander T. Kindel

    (Department of Sociology, Princeton University, Princeton, NJ 08544)

  • Caitlin E. Ahearn

    (Department of Sociology, University of California, Los Angeles, CA 90095)

  • Khaled Al-Ghoneim

    (Hawaz, Riyadh 12363, Saudi Arabia)

  • Abdullah Almaatouq

    (Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02142; Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139)

  • Drew M. Altschul

    (Mental Health Data Science Scotland, Department of Psychology, The University of Edinburgh, Edinburgh EH8 9JZ, United Kingdom;)

  • Jennie E. Brand

    (Department of Sociology, University of California, Los Angeles, CA 90095; Department of Statistics, University of California, Los Angeles, CA 90095)

  • Nicole Bohme Carnegie

    (Department of Mathematical Sciences, Montana State University, Bozeman, MT 59717)

  • Ryan James Compton

    (Human Computer Interaction Lab, University of California, Santa Cruz, CA 95064)

  • Debanjan Datta

    (Discovery Analytics Center, Virginia Polytechnic Institute and State University, Arlington, VA 22203)

  • Thomas Davidson

    (Department of Sociology, Cornell University, Ithaca, NY 14853)

  • Anna Filippova

    (GitHub, San Francisco, CA 94107)

  • Connor Gilroy

    (Department of Sociology, University of Washington, Seattle, WA 98105)

  • Brian J. Goode

    (Social and Decision Analytics Laboratory, Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, Arlington, VA 22203)

  • Eaman Jahani

    (Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, MA 02139)

  • Ridhi Kashyap

    (Department of Sociology, University of Oxford, Oxford OX1 1JD, United Kingdom; Nuffield College, University of Oxford, Oxford OX1 1NF, United Kingdom;School of Anthropology and Museum Ethnography, University of Oxford, Oxford OX2 6PE, United Kingdom)

  • Antje Kirchner

    (Program for Research in Survey Methodology, Survey Research Division, RTI International, Research Triangle Park, NC 27709)

  • Stephen McKay

    (School of Social and Political Sciences, University of Lincoln, Brayford Pool, Lincoln LN6 7TS, United Kingdom)

  • Allison C. Morgan

    (Department of Computer Science, University of Colorado, Boulder, CO 80309)

  • Alex Pentland

    (Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139)

  • Kivan Polimis

    (Center for the Study of Demography and Ecology, University of Washington, Seattle, WA 98105)

  • Louis Raes

    (Department of Economics, Tilburg School of Economics and Management, Tilburg University, 5037 AB Tilburg, The Netherlands;)

  • Daniel E. Rigobon

    (Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544)

  • Claudia V. Roberts

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Diana M. Stanescu

    (Department of Politics, Princeton University,Princeton, NJ, 08544)

  • Yoshihiko Suhara

    (Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139)

  • Adaner Usmani

    (Department of Sociology, Harvard University, Cambridge, MA 02138)

  • Erik H. Wang

    (Department of Politics, Princeton University,Princeton, NJ, 08544)

  • Muna Adem

    (Department of Sociology, Indiana University, Bloomington, IN 47405)

  • Abdulla Alhajri

    (Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139)

  • Bedoor AlShebli

    (Computational Social Science Lab, Social Science Division, New York University Abu Dhabi, 129188 Abu Dhabi, United Arab Emirates)

  • Redwane Amin

    (Bendheim Center for Finance, Princeton University, Princeton, NJ 08544)

  • Ryan B. Amos

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Lisa P. Argyle

    (Department of Political Science, Brigham Young University, Provo, UT 84602)

  • Livia Baer-Bositis

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Moritz Büchi

    (Department of Communication and Media Research, University of Zurich, Zurich, Switzerland, ZH-8050)

  • Bo-Ryehn Chung

    (Center for Statistics & Machine Learning, Princeton University, Princeton, NJ 08544)

  • William Eggert

    (Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ 08544)

  • Gregory Faletto

    (Statistics Group, Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089)

  • Zhilin Fan

    (Department of Statistics, Columbia University, New York, NY 10027)

  • Jeremy Freese

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Tejomay Gadgil

    (Center for Data Science, New York University, New York, NY 10011)

  • Josh Gagné

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Yue Gao

    (Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027)

  • Andrew Halpern-Manners

    (Department of Sociology, Indiana University, Bloomington, IN 47405)

  • Sonia P. Hashim

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Sonia Hausen

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Guanhua He

    (Department of Molecular Biology, Princeton University, Princeton, NJ 08544)

  • Kimberly Higuera

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Bernie Hogan

    (Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, United Kingdom)

  • Ilana M. Horwitz

    (Graduate School of Education, Stanford University, Stanford, CA, 94305)

  • Lisa M. Hummel

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Naman Jain

    (Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544)

  • Kun Jin

    (Department of Computer Science, Ohio State University, Columbus, OH 43210)

  • David Jurgens

    (School of Information, University of Michigan, Ann Arbor, MI 48104)

  • Patrick Kaminski

    (Department of Sociology, Indiana University, Bloomington, IN 47405; Center for Complex Networks and Systems Research, Indiana University, Bloomington, IN 47405)

  • Areg Karapetyan

    (Department of Computer Science, Masdar Institute, Khalifa University, 127788 Abu Dhabi, United Arab Emirates; Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502, Japan)

  • E. H. Kim

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Ben Leizman

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Naijia Liu

    (Department of Politics, Princeton University,Princeton, NJ, 08544)

  • Malte Möser

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Andrew E. Mack

    (Department of Politics, Princeton University,Princeton, NJ, 08544)

  • Mayank Mahajan

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Noah Mandell

    (Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544)

  • Helge Marahrens

    (Department of Sociology, Indiana University, Bloomington, IN 47405)

  • Diana Mercado-Garcia

    (Graduate School of Education, Stanford University, Stanford, CA, 94305)

  • Viola Mocz

    (Department of Neuroscience, Princeton University, Princeton, NJ 08544)

  • Katariina Mueller-Gastell

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Ahmed Musse

    (Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544)

  • Qiankun Niu

    (Bendheim Center for Finance, Princeton University, Princeton, NJ 08544)

  • William Nowak

    (Dataiku, New York, NY 10010)

  • Hamidreza Omidvar

    (Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544)

  • Andrew Or

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Karen Ouyang

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Katy M. Pinto

    (Department of Sociology, California State University, Dominguez Hills, Carson, CA 90747)

  • Ethan Porter

    (School of Media and Public Affairs, George Washington University, Washington, DC 20052)

  • Kristin E. Porter

    (School of Media and Public Affairs, George Washington University, Washington, DC 20052)

  • Crystal Qian

    (Center for Data Insights, MDRC, Oakland, CA 94612)

  • Tamkinat Rauf

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Anahit Sargsyan

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Thomas Schaffner

    (Social Science Division, New York University Abu Dhabi, 129188 Abu Dhabi, United Arab Emirates)

  • Landon Schnabel

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Bryan Schonfeld

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Ben Sender

    (Department of Politics, Princeton University,Princeton, NJ, 08544)

  • Jonathan D. Tang

    (Department of Economics, Princeton University, Princeton, NJ 08544)

  • Emma Tsurkov

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Austin van Loon

    (Department of Sociology, Stanford University, Stanford, CA 94305)

  • Onur Varol

    (Center for Complex Network Research, Northeastern University Networks Science Institute, Boston, MA 02115 Luddy School of Informatics, Computing, & Engineering, Indiana University, Bloomington, IN 47408)

  • Xiafei Wang

    (School of Social Work, David B. Falk College of Sport and Human Dynamics, Syracuse University, NY 13244)

  • Zhi Wang

    (Luddy School of Informatics, Computing, & Engineering, Indiana University, Bloomington, IN 47408; School of Public Health, Indiana University, Bloomington, IN 47408)

  • Julia Wang

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Flora Wang

    (Department of Economics, Princeton University, Princeton, NJ 08544)

  • Samantha Weissman

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Kirstie Whitaker

    (The Alan Turing Institute, London NW1 2DB, United Kingdom)

  • Maria K. Wolters

    (Department of Psychiatry, University of Cambridge, Cambridge CB2 0SZ, United Kingdom)

  • Wei Lee Woon

    (School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom)

  • James Wu

    (Department of Marketplaces & Yield Data Science, Expedia Group, Seattle, WA 98119)

  • Catherine Wu

    (Department of the Applied Statistics, Social Science, and Humanities, New York University, New York, NY 10003)

  • Kengran Yang

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Jingwen Yin

    (Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544)

  • Bingyu Zhao

    (Department of Statistics, Columbia University, New York, NY 10027)

  • Chenyun Zhu

    (Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom)

  • Jeanne Brooks-Gunn

    (Department of Statistics, Columbia University, New York, NY 10027)

  • Barbara E. Engelhardt

    (Department of Human Development, Teachers College, Columbia University, New York, NY 10027; Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032)

  • Moritz Hardt

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Dean Knox

    (Center for Statistics & Machine Learning, Princeton University, Princeton, NJ 08544)

  • Karen Levy

    (Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720)

  • Arvind Narayanan

    (Department of Computer Science, Princeton University, Princeton, NJ 08544)

  • Brandon M. Stewart

    (Department of Sociology, Princeton University, Princeton, NJ 08544)

  • Duncan J. Watts

    (Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104; Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA 19104; Operations, Information and Decisions Department, University of Pennsylvania, Philadelphia, PA 19104)

  • Sara McLanahan

    (Department of Sociology, Princeton University, Princeton, NJ 08544)

Abstract

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.

Suggested Citation

  • Matthew J. Salganik & Ian Lundberg & Alexander T. Kindel & Caitlin E. Ahearn & Khaled Al-Ghoneim & Abdullah Almaatouq & Drew M. Altschul & Jennie E. Brand & Nicole Bohme Carnegie & Ryan James Compton , 2020. "Measuring the predictability of life outcomes with a scientific mass collaboration," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(14), pages 8398-8403, April.
  • Handle: RePEc:nas:journl:v:117:y:2020:p:8398-8403
    as

    Download full text from publisher

    File URL: http://www.pnas.org/content/117/15/8398.full
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kirsten Martin & Ari Waldman, 2023. "Are Algorithmic Decisions Legitimate? The Effect of Process and Outcomes on Perceptions of Legitimacy of AI Decisions," Journal of Business Ethics, Springer, vol. 183(3), pages 653-670, March.
    2. Verhagen, Mark D., 2021. "Identifying and Improving Functional Form Complexity: A Machine Learning Framework," SocArXiv bka76, Center for Open Science.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nas:journl:v:117:y:2020:p:8398-8403. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Eric Cain (email available below). General contact details of provider: http://www.pnas.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.