IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2504.16864.html
   My bibliography  Save this paper

Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Author

Listed:
  • Manuel Quintero
  • William T. Stephenson
  • Advik Shreekumar
  • Tamara Broderick

Abstract

In science and social science, we often wish to explain why an outcome is different in two populations. For instance, if a jobs program benefits members of one city more than another, is that due to differences in program participants (particular covariates) or the local labor markets (outcomes given covariates)? The Kitagawa-Oaxaca-Blinder (KOB) decomposition is a standard tool in econometrics that explains the difference in the mean outcome across two populations. However, the KOB decomposition assumes a linear relationship between covariates and outcomes, while the true relationship may be meaningfully nonlinear. Modern machine learning boasts a variety of nonlinear functional decompositions for the relationship between outcomes and covariates in one population. It seems natural to extend the KOB decomposition using these functional decompositions. We observe that a successful extension should not attribute the differences to covariates -- or, respectively, to outcomes given covariates -- if those are the same in the two populations. Unfortunately, we demonstrate that, even in simple examples, two common decompositions -- functional ANOVA and Accumulated Local Effects -- can attribute differences to outcomes given covariates, even when they are identical in two populations. We provide a characterization of when functional ANOVA misattributes, as well as a general property that any discrete decomposition must satisfy to avoid misattribution. We show that if the decomposition is independent of its input distribution, it does not misattribute. We further conjecture that misattribution arises in any reasonable additive decomposition that depends on the distribution of the covariates.

Suggested Citation

  • Manuel Quintero & William T. Stephenson & Advik Shreekumar & Tamara Broderick, 2025. "Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations," Papers 2504.16864, arXiv.org.
  • Handle: RePEc:arx:papers:2504.16864
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2504.16864
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Antoniadis, Anestis & Lambert-Lacroix, Sophie & Poggi, Jean-Michel, 2021. "Random forests for global sensitivity analysis: A selective review," Reliability Engineering and System Safety, Elsevier, vol. 206(C).
    2. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    3. Kathryn E. Hill & Stuart C. Brown & Alice Jones & Damien Fordham & Robert S. Hill, 2023. "Modelling Climate Using Leaves of Nothofagus cunninghamii —Overcoming Confounding Factors," Sustainability, MDPI, vol. 15(9), pages 1-11, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    2. Leandro Andrián & Oscar Mauricio Valencia, 2023. "Past the Tipping Point? Assessing Debt Overhang in Latin America and the Caribbean," IDB Publications (Book Chapters), in: Andrew Powell & Oscar Mauricio Valencia (ed.), Dealing with Debt, edition 1, chapter 8, pages 183-196, Inter-American Development Bank.
    3. Lu, Xuefei & Borgonovo, Emanuele, 2023. "Global sensitivity analysis in epidemiological modeling," European Journal of Operational Research, Elsevier, vol. 304(1), pages 9-24.
    4. Kim, Jun Young & Kim, Dongjae & Li, Zezhong John & Dariva, Claudio & Cao, Yankai & Ellis, Naoko, 2023. "Predicting and optimizing syngas production from fluidized bed biomass gasifiers: A machine learning approach," Energy, Elsevier, vol. 263(PC).
    5. Jian Guo & Saizhuo Wang & Lionel M. Ni & Heung-Yeung Shum, 2022. "Quant 4.0: Engineering Quantitative Investment with Automated, Explainable and Knowledge-driven Artificial Intelligence," Papers 2301.04020, arXiv.org.
    6. Cao, Jason & Tao, Tao, 2025. "Can an identified environmental correlate of car ownership serve as a practical planning tool?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 191(C).
    7. Mehdi Dasineh & Amir Ghaderi & Mohammad Bagherzadeh & Mohammad Ahmadi & Alban Kuriqi, 2021. "Prediction of Hydraulic Jumps on a Triangular Bed Roughness Using Numerical Modeling and Soft Computing Methods," Mathematics, MDPI, vol. 9(23), pages 1-24, December.
    8. Hu'e Sullivan & Hurlin Christophe & P'erignon Christophe & Saurin S'ebastien, 2022. "Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring," Papers 2212.05866, arXiv.org, revised Jan 2025.
    9. Zhang, Chanyuan (Abigail) & Cho, Soohyun & Vasarhelyi, Miklos, 2022. "Explainable Artificial Intelligence (XAI) in auditing," International Journal of Accounting Information Systems, Elsevier, vol. 46(C).
    10. Bastos, João A. & Matos, Sara M., 2022. "Explainable models of credit losses," European Journal of Operational Research, Elsevier, vol. 301(1), pages 386-394.
    11. Daniel Borup & Philippe Goulet Coulombe & Erik Christian Montes Schütte & David E. Rapach & Sander Schwenk-Nebbe, 2022. "The Anatomy of Out-of-Sample Forecasting Accuracy," FRB Atlanta Working Paper 2022-16, Federal Reserve Bank of Atlanta.
    12. Jeetesh Sharma & Murari Lal Mittal & Gunjan Soni, 2024. "Condition-based maintenance using machine learning and role of interpretability: a review," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 15(4), pages 1345-1360, April.
    13. Torii, André Jacomel & Novotny, Antonio André, 2021. "A priori error estimates for local reliability-based sensitivity analysis with Monte Carlo Simulation," Reliability Engineering and System Safety, Elsevier, vol. 213(C).
    14. Hu, Songhua & Xiong, Chenfeng & Ji, Ya & Wu, Xin & Liu, Kailun & Schonfeld, Paul, 2024. "Understanding factors influencing user engagement in incentive-based travel demand management program," Transportation Research Part A: Policy and Practice, Elsevier, vol. 186(C).
    15. Shao, Qifan & Zhang, Wenjia & Cao, Xinyu (Jason) & Yang, Jiawen, 2023. "Built environment interventions for emission mitigation: A machine learning analysis of travel-related CO2 in a developing city," Journal of Transport Geography, Elsevier, vol. 110(C).
    16. Petros C. Lazaridis & Ioannis E. Kavvadias & Konstantinos Demertzis & Lazaros Iliadis & Lazaros K. Vasiliadis, 2023. "Interpretable Machine Learning for Assessing the Cumulative Damage of a Reinforced Concrete Frame Induced by Seismic Sequences," Sustainability, MDPI, vol. 15(17), pages 1-31, August.
    17. Jung, WoongHee & Taflanidis, Alexandros A., 2023. "Efficient global sensitivity analysis for high-dimensional outputs combining data-driven probability models and dimensionality reduction," Reliability Engineering and System Safety, Elsevier, vol. 231(C).
    18. Chris Reimann, 2024. "Predicting financial crises: an evaluation of machine learning algorithms and model explainability for early warning systems," Review of Evolutionary Political Economy, Springer, vol. 5(1), pages 51-83, June.
    19. Hu, Songhua & Xiong, Chenfeng & Chen, Peng & Schonfeld, Paul, 2023. "Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models," Transportation Research Part A: Policy and Practice, Elsevier, vol. 174(C).
    20. Wang, Ning & Xu, Yan & Wang, Sutong, 2022. "Interpretable boosting tree ensemble method for multisource building fire loss prediction," Reliability Engineering and System Safety, Elsevier, vol. 225(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2504.16864. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.