IDEAS home Printed from https://ideas.repec.org/p/fip/fedgfe/2024-75.html
   My bibliography  Save this paper

Explaining Machine Learning by Bootstrapping Partial Marginal Effects and Shapley Values

Author

Abstract

Machine learning and artificial intelligence are often described as “black boxes.” Traditional linear regression is interpreted through its marginal relationships as captured by regression coefficients. We show that the same marginal relationship can be described rigorously for any machine learning model by calculating the slope of the partial dependence functions, which we call the partial marginal effect (PME). We prove that the PME of OLS is analytically equivalent to the OLS regression coefficient. Bootstrapping provides standard errors and confidence intervals around the point estimates of the PMEs. We apply the PME to a hedonic house pricing example and demonstrate that the PMEs of neural networks, support vector machines, random forests, and gradient boosting models reveal the non-linear relationships discovered by the machine learning models and allow direct comparison between those models and a traditional linear regression. Finally we extend PME to a Shapley value decomposition and explore how it can be used to further explain model outputs.

Suggested Citation

  • Thomas R. Cook & Zach Modig & Nathan M. Palmer, 2024. "Explaining Machine Learning by Bootstrapping Partial Marginal Effects and Shapley Values," Finance and Economics Discussion Series 2024-075, Board of Governors of the Federal Reserve System (U.S.).
  • Handle: RePEc:fip:fedgfe:2024-75
    DOI: 10.17016/FEDS.2024.075
    as

    Download full text from publisher

    File URL: https://www.federalreserve.gov/econres/feds/files/2024075pap.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.17016/FEDS.2024.075?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Daniel P. McMillen & Christian L. Redfearn, 2010. "Estimation And Hypothesis Testing For Nonparametric Hedonic House Price Functions," Journal of Regional Science, Wiley Blackwell, vol. 50(3), pages 712-733, August.
    2. Glaeser, Edward & Sinai, Todd (ed.), 2013. "Housing and the Financial Crisis," National Bureau of Economic Research Books, University of Chicago Press, number 9780226030586, December.
    3. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    4. Michael J. Hanmer & Kerem Ozan Kalkan, 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models," American Journal of Political Science, John Wiley & Sons, vol. 57(1), pages 263-277, January.
    5. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    6. Limsombunchai, Visit, 2004. "House Price Prediction: Hedonic Price Model vs. Artificial Neural Network," 2004 Conference, June 25-26, 2004, Blenheim, New Zealand 97781, New Zealand Agricultural and Resource Economics Society.
    7. W.J. McCluskey & M. McCord & P.T. Davis & M. Haran & D. McIlhatton, 2013. "Prediction accuracy in mass appraisal: a comparison of modern approaches," Journal of Property Research, Taylor & Francis Journals, vol. 30(4), pages 239-265, December.
    8. Marianne Bertrand & Sendhil Mullainathan, 2004. "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," American Economic Review, American Economic Association, vol. 94(4), pages 991-1013, September.
    9. Joachim Zietz & Emily Zietz & G. Sirmans, 2008. "Determinants of House Prices: A Quantile Regression Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 37(4), pages 317-333, November.
    10. Edward L. Glaeser & Todd Sinai, 2013. "Housing and the Financial Crisis," NBER Books, National Bureau of Economic Research, Inc, number glae11-1, October.
    11. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    12. Edward E. Leamer, 2015. "Housing Really Is the Business Cycle: What Survives the Lessons of 2008–09?," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 47(S1), pages 43-50, March.
    13. Qingyuan Zhao & Trevor Hastie, 2021. "Causal Interpretations of Black-Box Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 272-281, January.
    14. Richard Williams, 2012. "Using the margins command to estimate and interpret adjusted predictions and marginal effects," Stata Journal, StataCorp LLC, vol. 12(2), pages 308-331, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:fip:fedkrr:96511 is not listed on IDEAS
    2. Thomas R. Cook & Nathan M. Palmer, 2023. "Understanding Models and Model Bias with Gaussian Processes," Research Working Paper RWP 23-07, Federal Reserve Bank of Kansas City.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jose Torres-Pruñonosa & Pablo García-Estévez & Josep Maria Raya & Camilo Prado-Román, 2022. "How on Earth Did Spanish Banking Sell the Housing Stock?," SAGE Open, , vol. 12(1), pages 21582440221, March.
    2. repec:osf:socarx:tjkcy_v1 is not listed on IDEAS
    3. Asproudis, Elias & Gedikli, Cigdem & Talavera, Oleksandr & Yilmaz, Okan, 2024. "Returns to solar panels in the housing market: A meta learner approach," Energy Economics, Elsevier, vol. 137(C).
    4. Julien Chevallier & Dominique Guégan & Stéphane Goutte, 2021. "Is It Possible to Forecast the Price of Bitcoin?," Forecasting, MDPI, vol. 3(2), pages 1-44, May.
    5. Islam, Towhidul & Meade, Nigel & Carson, Richard T. & Louviere, Jordan J. & Wang, Juan, 2022. "The usefulness of socio-demographic variables in predicting purchase decisions: Evidence from machine learning procedures," Journal of Business Research, Elsevier, vol. 151(C), pages 324-338.
    6. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    7. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Labib Shami & Teddy Lazebnik, 2024. "Implementing Machine Learning Methods in Estimating the Size of the Non-observed Economy," Computational Economics, Springer;Society for Computational Economics, vol. 63(4), pages 1459-1476, April.
    9. Ay, Jean-Sauveur & Le Gallo, Julie, 2021. "The Signaling Values of Nested Wine Names," Working Papers 321851, American Association of Wine Economists.
    10. Sakaue, Katsuki, 2018. "Informal fee charge and school choice under a free primary education policy: Panel data evidence from rural Uganda," International Journal of Educational Development, Elsevier, vol. 62(C), pages 112-127.
    11. Byron Botha & Rulof Burger & Kevin Kotzé & Neil Rankin & Daan Steenkamp, 2023. "Big data forecasting of South African inflation," Empirical Economics, Springer, vol. 65(1), pages 149-188, July.
    12. Begley, Jaclene & Chan, Sewin, 2018. "The effect of housing wealth shocks on work and retirement decisions," Regional Science and Urban Economics, Elsevier, vol. 73(C), pages 180-195.
    13. Chen, Ruoyu & Jiang, Hanchen & Quintero, Luis E., 2023. "Measuring the value of rent stabilization and understanding its implications for racial inequality: Evidence from New York City," Regional Science and Urban Economics, Elsevier, vol. 103(C).
    14. Dang, Hai-Anh & Carletto, Calogero & Gourlay, Sydney & Abanokova, Kseniya, 2024. "Addressing Soil Quality Data Gaps with Imputation: Evidence from Ethiopia and Uganda," GLO Discussion Paper Series 1445, Global Labor Organization (GLO).
    15. Dangxing Chen & Luyao Zhang, 2023. "Monotonicity for AI ethics and society: An empirical study of the monotonic neural additive model in criminology, education, health care, and finance," Papers 2301.07060, arXiv.org.
    16. Ono, Arito & Uchida, Hirofumi & Udell, Gregory F. & Uesugi, Iichiro, 2021. "Lending pro-cyclicality and macroprudential policy: Evidence from Japanese LTV ratios," Journal of Financial Stability, Elsevier, vol. 53(C).
    17. Ballestar, María Teresa & Mir, Miguel Cuerdo & Pedrera, Luis Miguel Doncel & Sainz, Jorge, 2024. "Effectiveness of tutoring at school: A machine learning evaluation," Technological Forecasting and Social Change, Elsevier, vol. 199(C).
    18. Daniel Levy & Tamir Mayer & Alon Raviv, 2020. "Academic Scholarship in Light of the 2008 Financial Crisis: Textual Analysis of NBER Working Papers," Working Papers hal-02488796, HAL.
    19. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    20. Barzin,Samira & Avner,Paolo & Maruyama Rentschler,Jun Erik & O’Clery,Neave, 2022. "Where Are All the Jobs ? A Machine Learning Approach for High Resolution Urban Employment Prediction inDeveloping Countries," Policy Research Working Paper Series 9979, The World Bank.
    21. Arenas, Andreu & Calsamiglia, Caterina, 2022. "Gender Differences in High-Stakes Performance and College Admission Policies," IZA Discussion Papers 15550, Institute of Labor Economics (IZA).

    More about this item

    Keywords

    ;
    ;
    ;

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedgfe:2024-75. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ryan Wolfslayer ; Keisha Fournillier (email available below). General contact details of provider: https://edirc.repec.org/data/frbgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.