IDEAS home Printed from https://ideas.repec.org/a/bpj/strimo/v23y2005i4-2005p249-279n1.html
   My bibliography  Save this article

Input-dependent estimation of generalization error under covariate shift

Author

Listed:
  • Sugiyama Masashi
  • Müller Klaus-Robert

Abstract

A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validation or Akaike's information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existing model selection methods in regression for extrapolation and in classification with imbalanced data.

Suggested Citation

  • Sugiyama Masashi & Müller Klaus-Robert, 2005. "Input-dependent estimation of generalization error under covariate shift," Statistics & Risk Modeling, De Gruyter, vol. 23(4/2005), pages 249-279, April.
  • Handle: RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1
    DOI: 10.1524/stnd.2005.23.4.249
    as

    Download full text from publisher

    File URL: https://doi.org/10.1524/stnd.2005.23.4.249
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1524/stnd.2005.23.4.249?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hidetoshi Shimodaira, 1997. "Assessing the Error Probability of the Model Selection Test," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 49(3), pages 395-410, September.
    2. Spokoiny, Vladimir, 2002. "Variance Estimation for High-Dimensional Regression Models," Journal of Multivariate Analysis, Elsevier, vol. 82(1), pages 111-133, July.
    3. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    4. Makio Ishiguro & Yosiyuki Sakamoto & Genshiro Kitagawa, 1997. "Bootstrapping Log Likelihood and EIC, an Extension of AIC," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 49(3), pages 411-434, September.
    5. Hidetoshi Shimodaira, 1998. "An Application of Multiple Comparison Techniques to Model Selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 50(1), pages 1-13, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Masashi Sugiyama & Taiji Suzuki & Shinichi Nakajima & Hisashi Kashima & Paul Bünau & Motoaki Kawanabe, 2008. "Direct importance estimation for covariate shift adaptation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 60(4), pages 699-746, December.
    2. Masashi Sugiyama & Taiji Suzuki & Takafumi Kanamori, 2012. "Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(5), pages 1009-1044, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5446, Otaru University of Commerce.
    2. Ogasawara, Haruhiko, 2015. "Asymptotic cumulants of some information criteria (2nd version)," ビジネス創造センターディスカッション・ペーパー (Discussion papers of the Center for Business Creation) 10252/5497, Otaru University of Commerce.
    3. Darima Fotheringham & Michael A. Wiles, 2023. "The effect of implementing chatbot customer service on stock returns: an event study analysis," Journal of the Academy of Marketing Science, Springer, vol. 51(4), pages 802-822, July.
    4. Song, Wei-Ling & Uzmanoglu, Cihan, 2016. "TARP announcement, bank health, and borrowers’ credit risk," Journal of Financial Stability, Elsevier, vol. 22(C), pages 22-32.
    5. Raymundo M. Campos-Vázquez, 2013. "Efectos de los ingresos no reportados en el nivel y tendencia de la pobreza laboral en México," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 23-54, November.
    6. Stephen Brown & William Goetzmann & Bing Liang & Christopher Schwarz, 2008. "Mandatory Disclosure and Operational Risk: Evidence from Hedge Fund Registration," Journal of Finance, American Finance Association, vol. 63(6), pages 2785-2815, December.
    7. Paul W. Miller & Barry R. Chiswick, 2002. "Immigrant earnings: Language skills, linguistic concentrations and the business cycle," Journal of Population Economics, Springer;European Society for Population Economics, vol. 15(1), pages 31-57.
    8. Chul‐Woo Kwon & Peter F. Orazem & Daniel M. Otto, 2006. "Off‐farm labor supply responses to permanent and transitory farm income," Agricultural Economics, International Association of Agricultural Economists, vol. 34(1), pages 59-67, January.
    9. Jonathan Gruber & Aaron Yelowitz, 1999. "Public Health Insurance and Private Savings," Journal of Political Economy, University of Chicago Press, vol. 107(6), pages 1249-1274, December.
    10. Jean-Louis Arcand & Linguère M'Baye, 2013. "Braving the waves: the role of time and risk preferences in illegal migration from Senegal," CERDI Working papers halshs-00855937, HAL.
    11. Sandra Müllbacher & Wolfgang Nagl, 2017. "Labour supply in Austria: an assessment of recent developments and the effects of a tax reform," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 44(3), pages 465-486, August.
    12. Campbell, Randall C. & Nagel, Gregory L., 2016. "Private information and limitations of Heckman's estimator in banking and corporate finance research," Journal of Empirical Finance, Elsevier, vol. 37(C), pages 186-195.
    13. Leye Li & Louise Yi Lu & Dongyue Wang, 2022. "External labour market competitions and stock price crash risk: evidence from exposures to competitor CEOs’ award‐winning events," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1421-1460, April.
    14. Jože P. Damijan & Mark Knell, 2005. "How Important Is Trade and Foreign Ownership in Closing the Technology Gap? Evidence from Estonia and Slovenia," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 141(2), pages 271-295, July.
    15. Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Renumeration Seniority," Discussion Paper 2004-120, Tilburg University, Center for Economic Research.
    16. Nakashima, Kiyotaka & Ogawa, Toshiaki, 2020. "The Impacts of Strengthening Regulatory Surveillance on Bank Behavior: A Dynamic Analysis from Incomplete to Complete Enforcement of Capital Regulation in Microprudential Policy," MPRA Paper 99938, University Library of Munich, Germany.
    17. Sarah Bridges & David Lawson, 2008. "Health and Labour Market Participation in Uganda," WIDER Working Paper Series DP2008-07, World Institute for Development Economic Research (UNU-WIDER).
    18. Ahn T. Le, 2003. "Female Labour Market Participation: Differences Between Primary and Tied Movers," Economics Discussion / Working Papers 03-17, The University of Western Australia, Department of Economics.
    19. Inmaculada Garc�a-Mainar & V�ctor M. Montuenga-G�mez, 2017. "Subjective educational mismatch and signalling in Spain," Documentos de Trabajo dt2017-03, Facultad de Ciencias Económicas y Empresariales, Universidad de Zaragoza.
    20. Insik Min & Jong‐Ho Kim, 2003. "Modeling Credit Card Borrowing: A Comparison of Type I and Type II Tobit Approaches," Southern Economic Journal, John Wiley & Sons, vol. 70(1), pages 128-143, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.