IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2102.04382.html
   My bibliography  Save this paper

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

Author

Listed:
  • Falco J. Bargagli Stoffi
  • Kenneth De Beckker
  • Joana E. Maldonado
  • Kristof De Witte

Abstract

Despite their popularity, machine learning predictions are sensitive to potential unobserved predictors. This paper proposes a general algorithm that assesses how the omission of an unobserved variable with high explanatory power could affect the predictions of the model. Moreover, the algorithm extends the usage of machine learning from pointwise predictions to inference and sensitivity analysis. In the application, we show how the framework can be applied to data with inherent uncertainty, such as students' scores in a standardized assessment on financial literacy. First, using Bayesian Additive Regression Trees (BART), we predict students' financial literacy scores (FLS) for a subgroup of students with missing FLS. Then, we assess the sensitivity of predictions by comparing the predictions and performance of models with and without a highly explanatory synthetic predictor. We find no significant difference in the predictions and performances of the augmented (i.e., the model with the synthetic predictor) and original model. This evidence sheds a light on the stability of the predictive model used in the application. The proposed methodology can be used, above and beyond our motivating empirical example, in a wide range of machine learning applications in social and health sciences.

Suggested Citation

  • Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
  • Handle: RePEc:arx:papers:2102.04382
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2102.04382
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Andrea Ichino & Fabrizia Mealli & Tommaso Nannicini, 2008. "From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 23(3), pages 305-327.
    5. Aaron Chalfin & Oren Danieli & Andrew Hillis & Zubin Jelveh & Michael Luca & Jens Ludwig & Sendhil Mullainathan, 2016. "Productivity and Selection of Human Capital with Machine Learning," American Economic Review, American Economic Association, vol. 106(5), pages 124-127, May.
    6. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    7. Sergio Longobardi & Margherita Maria Pagliuca & Andrea Regoli, 2018. "Can problem-solving attitudes explain the gender gap in financial literacy? Evidence from Italian students’ data," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(4), pages 1677-1705, July.
    8. Chiara Masci & Anna Maria Paganoni & Francesca Ieva, 2019. "Semiparametric mixed effects models for unsupervised classification of Italian schools," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1313-1342, October.
    9. Antonio R. Linero, 2018. "Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 626-636, April.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey, 2017. "Double/Debiased/Neyman Machine Learning of Treatment Effects," American Economic Review, American Economic Association, vol. 107(5), pages 261-265, May.
    11. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    12. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    13. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2016. "Double/Debiased Machine Learning for Treatment and Causal Parameters," Papers 1608.00060, arXiv.org, revised Dec 2017.
    14. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.
    15. Falco J. Bargagli-Dtoffi & Massimo Riccaboni & Armando Rungi, 2020. "Machine Learning for Zombie Hunting. Firms Failures and Financial Constraints," Working Papers 01/2020, IMT School for Advanced Studies Lucca, revised Jun 2020.
    16. Kapelner, Adam & Bleich, Justin, 2016. "bartMachine: Machine Learning with Bayesian Additive Regression Trees," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i04).
    17. Antonio R. Linero & Yun Yang, 2018. "Bayesian regression tree ensembles that adapt to smoothness and sparsity," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(5), pages 1087-1110, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Falco J. Bargagli-Stoffi & Fabio Incerti & Massimo Riccaboni & Armando Rungi, 2023. "Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values," Papers 2306.08165, arXiv.org.
    2. Ginevra Buratti & Alessio D'Ignazio, 2023. "Improving the effectiveness of financial education programs. A targeting approach," Questioni di Economia e Finanza (Occasional Papers) 765, Bank of Italy, Economic Research and International Relations Area.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
    2. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    3. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.
    4. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Tinbergen Institute Discussion Papers 21-001/V, Tinbergen Institute.
    5. Delprato, Marcos & Frola, Alessia & Antequera, Germán, 2022. "Indigenous and non-Indigenous proficiency gaps for out-of-school and in-school populations: A machine learning approach," International Journal of Educational Development, Elsevier, vol. 93(C).
    6. Monica Andini & Emanuele Ciani & Guido de Blasio & Alessio D'Ignazio & Viola Salvestrini, 2017. "Targeting policy-compliers with machine learning: an application to a tax rebate programme in Italy," Temi di discussione (Economic working papers) 1158, Bank of Italy, Economic Research and International Relations Area.
    7. Hoang, Daniel & Wiegratz, Kevin, 2022. "Machine learning methods in finance: Recent applications and prospects," Working Paper Series in Economics 158, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    8. Carl Bonander & Mikael Svensson, 2021. "Using causal forests to assess heterogeneity in cost‐effectiveness analysis," Health Economics, John Wiley & Sons, Ltd., vol. 30(8), pages 1818-1832, August.
    9. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    10. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    11. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    12. Maximilian Maurice Gail & Phil-Adrian Klotz, 2021. "The Impact of the Agency Model on E-book Prices: Evidence from the UK," MAGKS Papers on Economics 202111, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    13. Francesco Decarolis & Cristina Giorgiantonio, 2020. "Corruption red flags in public procurement: new evidence from Italian calls for tenders," Questioni di Economia e Finanza (Occasional Papers) 544, Bank of Italy, Economic Research and International Relations Area.
    14. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    15. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.
    16. Falco J. Bargagli-Stoffi & Fabio Incerti & Massimo Riccaboni & Armando Rungi, 2023. "Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values," Papers 2306.08165, arXiv.org.
    17. Akash Malhotra, 2021. "A hybrid econometric–machine learning approach for relative importance analysis: prioritizing food policy," Eurasian Economic Review, Springer;Eurasia Business and Economics Society, vol. 11(3), pages 549-581, September.
    18. Guido de Blasio & Alessio D'Ignazio & Marco Letta, 2020. "Predicting Corruption Crimes with Machine Learning. A Study for the Italian Municipalities," Working Papers 16/20, Sapienza University of Rome, DISS.
    19. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    20. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2102.04382. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.