IDEAS home Printed from https://ideas.repec.org/p/fip/fedkrw/93596.html
   My bibliography  Save this paper

Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values

Author

Abstract

Machine learning and artificial intelligence methods are often referred to as “black boxes” when compared with traditional regression-based approaches. However, both traditional and machine learning methods are concerned with modeling the joint distribution between endogenous (target) and exogenous (input) variables. Where linear models describe the fitted relationship between the target and input variables via the slope of that relationship (coefficient estimates), the same fitted relationship can be described rigorously for any machine learning model by first-differencing the partial dependence functions. Bootstrapping these first-differenced functionals provides standard errors and confidence intervals for the estimated relationships. We show that this approach replicates the point estimates of OLS coefficients and demonstrate how this generalizes to marginal relationships in machine learning and artificial intelligence models. We further discuss the relationship of partial dependence functions to Shapley value decompositions and explore how they can be used to further explain model outputs.

Suggested Citation

  • Thomas R. Cook & Greg Gupton & Zach Modig & Nathan M. Palmer, 2021. "Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values," Research Working Paper RWP 21-12, Federal Reserve Bank of Kansas City.
  • Handle: RePEc:fip:fedkrw:93596
    DOI: 10.18651/RWP2021-12
    as

    Download full text from publisher

    File URL: https://www.kansascityfed.org/documents/8518/rwp21-12cookguptonmodigpalmer.pdf
    File Function: Full text
    Download Restriction: no

    File URL: https://libkey.io/10.18651/RWP2021-12?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Limsombunchai, Visit, 2004. "House Price Prediction: Hedonic Price Model vs. Artificial Neural Network," 2004 Conference, June 25-26, 2004, Blenheim, New Zealand 97781, New Zealand Agricultural and Resource Economics Society.
    2. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    3. Marianne Bertrand & Sendhil Mullainathan, 2004. "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," American Economic Review, American Economic Association, vol. 94(4), pages 991-1013, September.
    4. Daniel P. McMillen & Christian L. Redfearn, 2010. "Estimation And Hypothesis Testing For Nonparametric Hedonic House Price Functions," Journal of Regional Science, Wiley Blackwell, vol. 50(3), pages 712-733, August.
    5. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    6. Michael J. Hanmer & Kerem Ozan Kalkan, 2013. "Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models," American Journal of Political Science, John Wiley & Sons, vol. 57(1), pages 263-277, January.
    7. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    8. W.J. McCluskey & M. McCord & P.T. Davis & M. Haran & D. McIlhatton, 2013. "Prediction accuracy in mass appraisal: a comparison of modern approaches," Journal of Property Research, Taylor & Francis Journals, vol. 30(4), pages 239-265, December.
    9. Joachim Zietz & Emily Zietz & G. Sirmans, 2008. "Determinants of House Prices: A Quantile Regression Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 37(4), pages 317-333, November.
    10. Qingyuan Zhao & Trevor Hastie, 2021. "Causal Interpretations of Black-Box Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 272-281, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:fip:fedkrr:96511 is not listed on IDEAS
    2. Thomas R. Cook & Nathan M. Palmer, 2023. "Understanding Models and Model Bias with Gaussian Processes," Research Working Paper RWP 23-07, Federal Reserve Bank of Kansas City.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jose Torres-Pruñonosa & Pablo García-Estévez & Josep Maria Raya & Camilo Prado-Román, 2022. "How on Earth Did Spanish Banking Sell the Housing Stock?," SAGE Open, , vol. 12(1), pages 21582440221, March.
    2. repec:osf:socarx:tjkcy_v1 is not listed on IDEAS
    3. Julien Chevallier & Dominique Guégan & Stéphane Goutte, 2021. "Is It Possible to Forecast the Price of Bitcoin?," Forecasting, MDPI, vol. 3(2), pages 1-44, May.
    4. Asproudis, Elias & Gedikli, Cigdem & Talavera, Oleksandr & Yilmaz, Okan, 2024. "Returns to solar panels in the housing market: A meta learner approach," Energy Economics, Elsevier, vol. 137(C).
    5. Islam, Towhidul & Meade, Nigel & Carson, Richard T. & Louviere, Jordan J. & Wang, Juan, 2022. "The usefulness of socio-demographic variables in predicting purchase decisions: Evidence from machine learning procedures," Journal of Business Research, Elsevier, vol. 151(C), pages 324-338.
    6. Labib Shami & Teddy Lazebnik, 2024. "Implementing Machine Learning Methods in Estimating the Size of the Non-observed Economy," Computational Economics, Springer;Society for Computational Economics, vol. 63(4), pages 1459-1476, April.
    7. Ay, Jean-Sauveur & Le Gallo, Julie, 2021. "The Signaling Values of Nested Wine Names," Working Papers 321851, American Association of Wine Economists.
    8. Chen, Ruoyu & Jiang, Hanchen & Quintero, Luis E., 2023. "Measuring the value of rent stabilization and understanding its implications for racial inequality: Evidence from New York City," Regional Science and Urban Economics, Elsevier, vol. 103(C).
    9. Dang, Hai-Anh & Carletto, Calogero & Gourlay, Sydney & Abanokova, Kseniya, 2024. "Addressing Soil Quality Data Gaps with Imputation: Evidence from Ethiopia and Uganda," GLO Discussion Paper Series 1445, Global Labor Organization (GLO).
    10. Ballestar, María Teresa & Mir, Miguel Cuerdo & Pedrera, Luis Miguel Doncel & Sainz, Jorge, 2024. "Effectiveness of tutoring at school: A machine learning evaluation," Technological Forecasting and Social Change, Elsevier, vol. 199(C).
    11. Tsang, Andrew, 2021. "Uncovering Heterogeneous Regional Impacts of Chinese Monetary Policy," MPRA Paper 110703, University Library of Munich, Germany.
    12. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    14. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    15. Rodríguez-Vargas, Adolfo, 2020. "Forecasting Costa Rican inflation with machine learning methods," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 1(1).
    16. Jesus Fernandez-Villaverde, 2020. "Simple Rules for a Complex World with Arti?cial Intelligence," PIER Working Paper Archive 20-010, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    17. Raul-Tomas Mora-Garcia & Maria-Francisca Cespedes-Lopez & V. Raul Perez-Sanchez & Pablo Marti & Juan-Carlos Perez-Sanchez, 2019. "Determinants of the Price of Housing in the Province of Alicante (Spain): Analysis Using Quantile Regression," Sustainability, MDPI, vol. 11(2), pages 1-33, January.
    18. Blankenship, Brian & Aklin, Michaël & Urpelainen, Johannes & Nandan, Vagisha, 2022. "Jobs for a just transition: Evidence on coal job preferences from India," Energy Policy, Elsevier, vol. 165(C).
    19. Andrei Dubovik & Adam Elbourne & Bram Hendriks & Mark Kattenberg, 2022. "Forecasting World Trade Using Big Data and Machine Learning Techniques," CPB Discussion Paper 441, CPB Netherlands Bureau for Economic Policy Analysis.
    20. Askitas, Nikos, 2024. "A Hands-on Machine Learning Primer for Social Scientists: Math, Algorithms and Code," IZA Discussion Papers 17014, Institute of Labor Economics (IZA).
    21. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.

    More about this item

    Keywords

    Machine learning; Artificial intelligence; Explainable machine learning; Shapley values; Model interpretation;
    All these keywords.

    JEL classification:

    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedkrw:93596. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Zach Kastens (email available below). General contact details of provider: https://edirc.repec.org/data/frbkcus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.