IDEAS home Printed from https://ideas.repec.org/a/bpj/causin/v10y2022i1p64-89n2.html
   My bibliography  Save this article

A unifying causal framework for analyzing dataset shift-stable learning algorithms

Author

Listed:
  • Subbaswamy Adarsh

    (Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, United States)

  • Chen Bryant

    (Brex Inc, San Francisco, California, United States)

  • Saria Suchi

    (Department of Computer Science, Johns Hopkins University and Bayesian Health, Baltimore, MD 21218, United States)

Abstract

Recent interest in the external validity of prediction models (i.e., the problem of different train and test distributions, known as dataset shift) has produced many methods for finding predictive distributions that are invariant to dataset shifts and can be used for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disparate frameworks, making it difficult to theoretically analyze how solutions differ with respect to stability and accuracy. Taking a causal graphical view, we use a flexible graphical representation to express various types of dataset shifts. Given a known graph of the data generating process, we show that all invariant distributions correspond to a causal hierarchy of graphical operators, which disable the edges in the graph that are responsible for the shifts. The hierarchy provides a common theoretical underpinning for understanding when and how stability to shifts can be achieved, and in what ways stable distributions can differ. We use it to establish conditions for minimax optimal performance across environments, and derive new algorithms that find optimal stable distributions. By using this new perspective, we empirically demonstrate that that there is a tradeoff between minimax and average performance.

Suggested Citation

  • Subbaswamy Adarsh & Chen Bryant & Saria Suchi, 2022. "A unifying causal framework for analyzing dataset shift-stable learning algorithms," Journal of Causal Inference, De Gruyter, vol. 10(1), pages 64-89, January.
  • Handle: RePEc:bpj:causin:v:10:y:2022:i:1:p:64-89:n:2
    DOI: 10.1515/jci-2021-0042
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/jci-2021-0042
    Download Restriction: no

    File URL: https://libkey.io/10.1515/jci-2021-0042?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Francis Vella, 1998. "Estimating Models with Sample Selection Bias: A Survey," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 127-169.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernandez-Cornejo, Jorge & Wechsler, Seth James, 2012. "Fifteen Years Later: Examining the Adoption of Bt Corn Varieties by U.S. Farmers," 2012 Annual Meeting, August 12-14, 2012, Seattle, Washington 124257, Agricultural and Applied Economics Association.
    2. Khondoker Abdul Mottaleb & Dil Bahadur Rahut, 2019. "Impacts of Improved Infrastructure on Labor Allocation and Livelihoods: The Case of the Jamuna Multipurpose Bridge, Bangladesh," The European Journal of Development Research, Palgrave Macmillan;European Association of Development Research and Training Institutes (EADI), vol. 31(4), pages 750-778, September.
    3. D'Addio, Anna Cristina & De Greef, Isabelle & Rosholm, Michael, 2002. "Assessing Unemployment Traps in Belgium Using Panel Data Sample Selection Models," IZA Discussion Papers 669, Institute of Labor Economics (IZA).
    4. Fidel Gonzalez & Santosh Kumar, 2018. "Prenatal care and birthweight in Mexico," Applied Economics, Taylor & Francis Journals, vol. 50(10), pages 1156-1170, February.
    5. René Algesheimer & Sharad Borle & Utpal M. Dholakia & Siddharth S. Singh, 2010. "The Impact of Customer Community Participation on Customer Behaviors: An Empirical Investigation," Marketing Science, INFORMS, vol. 29(4), pages 756-769, 07-08.
    6. Myck, Michal & Nici?ska, Anna & Morawski, Leszek, 2009. "Count Your Hours: Returns to Education in Poland," IZA Discussion Papers 4332, Institute of Labor Economics (IZA).
    7. Paul Ellickson & Sanjog Misra, 2012. "Enriching interactions: Incorporating outcome data into static discrete games," Quantitative Marketing and Economics (QME), Springer, vol. 10(1), pages 1-26, March.
    8. Hajime Seya & Junyi Zhang & Makoto Chikaraishi & Ying Jiang, 2020. "Decisions on truck parking place and time on expressways: an analysis using digital tachograph data," Transportation, Springer, vol. 47(2), pages 555-583, April.
    9. Eric Chiang & Djeto Assane, 2007. "Determinants of music copyright violations on the university campus," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 31(3), pages 187-204, September.
    10. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    11. Manuel Arellano & Stéphane Bonhomme, 2017. "Quantile Selection Models With an Application to Understanding Changes in Wage Inequality," Econometrica, Econometric Society, vol. 85, pages 1-28, January.
    12. Larry W. Hunter, 2000. "What Determines Job Quality in Nursing Homes?," ILR Review, Cornell University, ILR School, vol. 53(3), pages 463-481, April.
    13. repec:hal:wpspec:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    14. Carlos Medina & Jairo Núñez, 2005. "Repercusiones de la capacitación laboral pública y privada en Colombia," Research Department Publications 3178, Inter-American Development Bank, Research Department.
    15. S. I. Dolgikh & B. S. Potanin, 2023. "The Impact of Public Administration on the Efficiency of Russian Firms," Studies on Russian Economic Development, Springer, vol. 34(1), pages 59-67, February.
    16. Lidia Ceriani & Sergio Olivieri & Marco Ranzani, 2023. "Housing, imputed rent, and household welfare," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 21(1), pages 131-168, March.
    17. Stephan Wachtel & Thomas Otter, 2013. "Successive Sample Selection and Its Relevance for Management Decisions," Marketing Science, INFORMS, vol. 32(1), pages 170-185, September.
    18. Tocco, Barbara & Bailey, Alastair & Davidova, Sophia, 2013. "Determinants to Leave Agriculture and Change Occupational Sector: Evidence from an Enlarged EU," Working papers 155704, Factor Markets, Centre for European Policy Studies.
    19. Benitez-Silva, Hugo & Dwyer, Debra S., 2006. "Expectation formation of older married couples and the rational expectations hypothesis," Labour Economics, Elsevier, vol. 13(2), pages 191-218, April.
    20. Nabanita Datta Gupta & Mona Larsen, 2010. "The impact of health on individual retirement plans: self‐reported versus diagnostic measures," Health Economics, John Wiley & Sons, Ltd., vol. 19(7), pages 792-813, July.
    21. Keisuke Hirano & Guido W. Imbens & Geert Ridder & Donald B. Rubin, 2001. "Combining Panel Data Sets with Attrition and Refreshment Samples," Econometrica, Econometric Society, vol. 69(6), pages 1645-1659, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:causin:v:10:y:2022:i:1:p:64-89:n:2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.