IDEAS home Printed from https://ideas.repec.org/a/bpj/strimo/v23y2005i4-2005p249-279n1.html
   My bibliography  Save this article

Input-dependent estimation of generalization error under covariate shift

Author

Listed:
  • Sugiyama Masashi
  • Müller Klaus-Robert

Abstract

A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validation or Akaike's information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existing model selection methods in regression for extrapolation and in classification with imbalanced data.

Suggested Citation

  • Sugiyama Masashi & Müller Klaus-Robert, 2005. "Input-dependent estimation of generalization error under covariate shift," Statistics & Risk Modeling, De Gruyter, vol. 23(4), pages 249-279, April.
  • Handle: RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1
    DOI: 10.1524/stnd.2005.23.4.249
    as

    Download full text from publisher

    File URL: https://doi.org/10.1524/stnd.2005.23.4.249
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1524/stnd.2005.23.4.249?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hidetoshi Shimodaira, 1997. "Assessing the Error Probability of the Model Selection Test," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 49(3), pages 395-410, September.
    2. Spokoiny, Vladimir, 2002. "Variance Estimation for High-Dimensional Regression Models," Journal of Multivariate Analysis, Elsevier, vol. 82(1), pages 111-133, July.
    3. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    4. Makio Ishiguro & Yosiyuki Sakamoto & Genshiro Kitagawa, 1997. "Bootstrapping Log Likelihood and EIC, an Extension of AIC," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 49(3), pages 411-434, September.
    5. Hidetoshi Shimodaira, 1998. "An Application of Multiple Comparison Techniques to Model Selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 50(1), pages 1-13, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Masashi Sugiyama & Taiji Suzuki & Shinichi Nakajima & Hisashi Kashima & Paul Bünau & Motoaki Kawanabe, 2008. "Direct importance estimation for covariate shift adaptation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 60(4), pages 699-746, December.
    2. Masashi Sugiyama & Taiji Suzuki & Takafumi Kanamori, 2012. "Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(5), pages 1009-1044, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5446, Otaru University of Commerce.
    2. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria (2nd version)," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5497, Otaru University of Commerce.
    3. Darima Fotheringham & Michael A. Wiles, 2023. "The effect of implementing chatbot customer service on stock returns: an event study analysis," Journal of the Academy of Marketing Science, Springer, vol. 51(4), pages 802-822, July.
    4. Robert B. Ekelund & John D. Jackson & Robert D. Tollison, 2013. "Are Art Auction Estimates Biased?," Southern Economic Journal, John Wiley & Sons, vol. 80(2), pages 454-465, October.
    5. Song, Wei-Ling & Uzmanoglu, Cihan, 2016. "TARP announcement, bank health, and borrowers’ credit risk," Journal of Financial Stability, Elsevier, vol. 22(C), pages 22-32.
    6. Xu, Shen & Yin, Bichao & Lou, Chunjie, 2022. "Minority shareholder activism and corporate social responsibility," Economic Modelling, Elsevier, vol. 116(C).
    7. Saziye Gazioglu & Aysit Tansel, 2006. "Job satisfaction in Britain: individual and job related factors," Applied Economics, Taylor & Francis Journals, vol. 38(10), pages 1163-1171.
    8. Raymundo M. Campos-Vázquez, 2013. "Efectos de los ingresos no reportados en el nivel y tendencia de la pobreza laboral en México," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 23-54, November.
    9. Ichev, Riste & Valentinčič, Aljoša, 2025. "The effect of impact investing on performance of private firms," Research in International Business and Finance, Elsevier, vol. 73(PA).
    10. Stephen Brown & William Goetzmann & Bing Liang & Christopher Schwarz, 2008. "Mandatory Disclosure and Operational Risk: Evidence from Hedge Fund Registration," Journal of Finance, American Finance Association, vol. 63(6), pages 2785-2815, December.
    11. Fabrizio Rossi & Maretno Agus Harjoto, 2020. "Corporate non-financial disclosure, firm value, risk, and agency costs: evidence from Italian listed companies," Review of Managerial Science, Springer, vol. 14(5), pages 1149-1181, October.
    12. Claudio A. Agostini & Marcela Perticara & Javiera Selman, 2023. "Tackling Vulnerable Households through a Working Tax Credit Scheme: A Feasible Alternative to Cash Transfers," Hacienda Pública Española / Review of Public Economics, IEF, vol. 245(2), pages 119-155, June.
    13. Paul W. Miller & Barry R. Chiswick, 2002. "Immigrant earnings: Language skills, linguistic concentrations and the business cycle," Journal of Population Economics, Springer;European Society for Population Economics, vol. 15(1), pages 31-57.
    14. Chul‐Woo Kwon & Peter F. Orazem & Daniel M. Otto, 2006. "Off‐farm labor supply responses to permanent and transitory farm income," Agricultural Economics, International Association of Agricultural Economists, vol. 34(1), pages 59-67, January.
    15. Castagnetti, Carolina & Rosti, Luisa, 2010. "Gender stereotyping and wage discrimination among Italian graduates," MPRA Paper 26685, University Library of Munich, Germany.
    16. Jonathan Gruber & Aaron Yelowitz, 1999. "Public Health Insurance and Private Savings," Journal of Political Economy, University of Chicago Press, vol. 107(6), pages 1249-1274, December.
    17. Jean-Louis Arcand & Linguère M'Baye, 2013. "Braving the waves: the role of time and risk preferences in illegal migration from Senegal," CERDI Working papers halshs-00855937, HAL.
    18. Chia-Ling Chao & Shwu-Min Horng, 2013. "Does the SEC's Waiver of IFRS to U.S. GAAP Reconciliation Improve the Quality of Financial Reporting?," Accounting and Finance Research, Sciedu Press, vol. 2(3), pages 1-78, August.
    19. Emily Ouma & John Jagwe & Gideon Aiko Obare & Steffen Abele, 2010. "Determinants of smallholder farmers' participation in banana markets in Central Africa: the role of transaction costs," Agricultural Economics, International Association of Agricultural Economists, vol. 41(2), pages 111-122, March.
    20. Boubakri, Narjess & Ghouma, Hatem, 2010. "Control/ownership structure, creditor rights protection, and the cost of debt financing: International evidence," Journal of Banking & Finance, Elsevier, vol. 34(10), pages 2481-2499, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.