IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/s5zc8.html
   My bibliography  Save this paper

Researcher reasoning meets computational capacity: Machine learning for social science

Author

Listed:
  • Lundberg, Ian
  • Brand, Jennie E.

    (UCLA)

  • Jeon, Nanum

Abstract

Computational power and digital data have created new opportunities to explore and understand the social world. A special synergy is possible when social scientists combine human attention to certain aspects of the problem with the power of algorithms to automate other aspects of the problem. We review selected exemplary applications where machine learning amplifies researcher coding, summarizes complex data, relaxes statistical assumptions, and targets researcher attention. We then seek to reduce perceived barriers to machine learning by summarizing several fundamental building blocks and their grounding in classical statistics. We present a few guiding principles and promising approaches where we see particular potential for machine learning to transform social science inquiry. We conclude that machine learning tools are accessible, worthy of attention, and ready to yield new discoveries.

Suggested Citation

  • Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
  • Handle: RePEc:osf:socarx:s5zc8
    DOI: 10.31219/osf.io/s5zc8
    as

    Download full text from publisher

    File URL: https://osf.io/download/628be3e647df857a0c6ef8bb/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/s5zc8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    2. Aaron Chalfin & Oren Danieli & Andrew Hillis & Zubin Jelveh & Michael Luca & Jens Ludwig & Sendhil Mullainathan, 2016. "Productivity and Selection of Human Capital with Machine Learning," American Economic Review, American Economic Association, vol. 106(5), pages 124-127, May.
    3. Hainmueller, Jens, 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies," Political Analysis, Cambridge University Press, vol. 20(1), pages 25-46, January.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Benjamin Handel & Jonathan Kolstad, 2017. "Wearable Technologies and Health Behaviors: New Data and New Methods to Understand Population Health," American Economic Review, American Economic Association, vol. 107(5), pages 481-485, May.
    6. Guido W. Imbens, 2015. "Matching Methods in Practice: Three Examples," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 373-419.
    7. Bisbee, James, 2019. "BARP: Improving Mister P Using Bayesian Additive Regression Trees," American Political Science Review, Cambridge University Press, vol. 113(4), pages 1060-1065, November.
    8. repec:cup:apsrev:v:113:y:2019:i:04:p:1060-1065_00 is not listed on IDEAS
    9. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    10. Margaret E. Roberts & Brandon M. Stewart & Dustin Tingley & Christopher Lucas & Jetson Leder‐Luis & Shana Kushner Gadarian & Bethany Albertson & David G. Rand, 2014. "Structural Topic Models for Open‐Ended Survey Responses," American Journal of Political Science, John Wiley & Sons, vol. 58(4), pages 1064-1082, October.
    11. Matthew Gentzkow & Jesse M. Shapiro & Matt Taddy, 2019. "Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech," Econometrica, Econometric Society, vol. 87(4), pages 1307-1340, July.
    12. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    13. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    14. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    15. Freese, Jeremy & Peterson, David, 2017. "Replication in Social Science," SocArXiv 5bck9, Center for Open Science.
    16. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    17. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    18. Arindrajit Dube & Jeff Jacobs & Suresh Naidu & Siddharth Suri, 2020. "Monopsony in Online Labor Markets," American Economic Review: Insights, American Economic Association, vol. 2(1), pages 33-46, March.
    19. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    20. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    21. D’Amour, Alexander & Ding, Peng & Feller, Avi & Lei, Lihua & Sekhon, Jasjeet, 2021. "Overlap in observational studies with high-dimensional covariates," Journal of Econometrics, Elsevier, vol. 221(2), pages 644-654.
    22. Hainmueller, Jens & Hopkins, Daniel J. & Yamamoto, Teppei, 2014. "Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments," Political Analysis, Cambridge University Press, vol. 22(1), pages 1-30, January.
    23. Jonathan M.V. Davis & Sara B. Heller, 2017. "Using Causal Forests to Predict Treatment Heterogeneity: An Application to Summer Jobs," American Economic Review, American Economic Association, vol. 107(5), pages 546-550, May.
    24. repec:cup:apsrev:v:113:y:2019:i:03:p:710-726_00 is not listed on IDEAS
    25. Erin Hartman & Richard Grieve & Roland Ramsahai & Jasjeet S. Sekhon, 2015. "From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(3), pages 757-778, June.
    26. Cantãš, Francisco, 2019. "The Fingerprints of Fraud: Evidence from Mexico’s 1988 Presidential Election," American Political Science Review, Cambridge University Press, vol. 113(3), pages 710-726, August.
    27. King, Gary & Pan, Jennifer & Roberts, Margaret E., 2017. "How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument," American Political Science Review, Cambridge University Press, vol. 111(3), pages 484-501, August.
    28. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    29. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    30. Incerti, Trevor, 2020. "Corruption Information and Vote Share: A Meta-Analysis and Lessons for Experimental Design," American Political Science Review, Cambridge University Press, vol. 114(3), pages 761-774, August.
    31. Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
    32. Peter M. Aronow & Cyrus Samii, 2016. "Does Regression Produce Representative Estimates of Causal Effects?," American Journal of Political Science, John Wiley & Sons, vol. 60(1), pages 250-267, January.
    33. Iacus, Stefano M. & King, Gary & Porro, Giuseppe, 2012. "Causal Inference without Balance Checking: Coarsened Exact Matching," Political Analysis, Cambridge University Press, vol. 20(1), pages 1-24, January.
    34. Knox, Dean & Lucas, Christopher, 2021. "A Dynamic Model of Speech for the Social Sciences," American Political Science Review, Cambridge University Press, vol. 115(2), pages 649-666, May.
    35. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Małgorzata Skweres-Kuchta & Iwona Czerska & Elżbieta Szaruga, 2023. "Literature Review on Health Emigration in Rare Diseases—A Machine Learning Perspective," IJERPH, MDPI, vol. 20(3), pages 1-31, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    2. Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
    3. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    4. Zhang, Han, 2021. "How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It," SocArXiv 453jk, Center for Open Science.
    5. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    6. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    7. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    8. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    9. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    10. Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
    11. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    12. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    13. Aysegül Kayaoglu & Ghassan Baliki & Tilman Brück & Melodie Al Daccache & Dorothee Weiffen, 2023. "How to conduct impact evaluations in humanitarian and conflict settings," HiCN Working Papers 387, Households in Conflict Network.
    14. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    15. Huber, Martin & Meier, Jonas & Wallimann, Hannes, 2022. "Business analytics meets artificial intelligence: Assessing the demand effects of discounts on Swiss train tickets," Transportation Research Part B: Methodological, Elsevier, vol. 163(C), pages 22-39.
    16. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    17. Ajit Desai, 2023. "Machine Learning for Economics Research: When What and How?," Papers 2304.00086, arXiv.org, revised Apr 2023.
    18. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    19. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.
    20. Daniel Goller & Tamara Harrer & Michael Lechner & Joachim Wolff, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Papers 2106.10141, arXiv.org, revised May 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:s5zc8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.