IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1610.05448.html
   My bibliography  Save this paper

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

Author

Listed:
  • Ning Xu
  • Jian Hong
  • Timothy C. G. Fisher

Abstract

We study model evaluation and model selection from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. We believe that GA is one way formally to address concerns about the external validity of a model. The GA of a model estimated on a sample can be measured by its empirical out-of-sample errors, called the generalization errors (GE). We derive upper bounds for the GE, which depend on sample sizes, model complexity and the distribution of the loss function. The upper bounds can be used to evaluate the GA of a model, ex ante. We propose using generalization error minimization (GEM) as a framework for model selection. Using GEM, we are able to unify a big class of penalized regression estimators, including lasso, ridge and bridge, under the same set of assumptions. We establish finite-sample and asymptotic properties (including $\mathcal{L}_2$-consistency) of the GEM estimator for both the $n \geqslant p$ and the $n

Suggested Citation

  • Ning Xu & Jian Hong & Timothy C. G. Fisher, 2016. "Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression," Papers 1610.05448, arXiv.org.
  • Handle: RePEc:arx:papers:1610.05448
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1610.05448
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. James J. Heckman & Vytlacil, Edward J., 2007. "Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 71, Elsevier.
    2. Francesco Guala & Luigi Mittone, 2005. "Experiments in economics: External validity and the robustness of phenomena," Journal of Economic Methodology, Taylor & Francis Journals, vol. 12(4), pages 495-515.
    3. Peter Hall & Jeff Racine & Qi Li, 2004. "Cross-Validation and the Estimation of Conditional Probability Densities," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1015-1026, December.
    4. repec:feb:artefa:0110 is not listed on IDEAS
    5. John A. List, 2011. "Why Economists Should Conduct Field Experiments and 14 Tips for Pulling One Off," Journal of Economic Perspectives, American Economic Association, vol. 25(3), pages 3-16, Summer.
    6. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    7. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    8. Jens Ludwig & Jeffrey R. Kling & Sendhil Mullainathan, 2011. "Mechanism Experiments and Policy Evaluations," Journal of Economic Perspectives, American Economic Association, vol. 25(3), pages 17-38, Summer.
    9. Brian E. Roe & David R. Just, 2009. "Internal and External Validity in Economics Research: Tradeoffs between Experiments, Field Experiments, Natural Experiments, and Field Data," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 91(5), pages 1266-1271.
    10. Caner, Mehmet, 2009. "Lasso-Type Gmm Estimator," Econometric Theory, Cambridge University Press, vol. 25(1), pages 270-290, February.
    11. Joshua Angrist & Ivan Fernandez-Val, 2010. "ExtrapoLATE-ing: External Validity and Overidentification in the LATE Framework," NBER Working Papers 16566, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ning Xu & Jian Hong & Timothy C. G. Fisher, 2016. "Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso," Papers 1606.00142, arXiv.org.
    2. Muller, Sean, 2014. "Randomised trials for policy: a review of the external validity of treatment effects," SALDRU Working Papers 127, Southern Africa Labour and Development Research Unit, University of Cape Town.
    3. Caner, Mehmet & Fan, Qingliang, 2015. "Hybrid generalized empirical likelihood estimators: Instrument selection with adaptive lasso," Journal of Econometrics, Elsevier, vol. 187(1), pages 256-274.
    4. Ning Xu & Jian Hong & Timothy C. G. Fisher, 2016. "Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression," Papers 1609.03344, arXiv.org, revised Sep 2016.
    5. Costa, Alexandre Bonnet R. & Ferreira, Pedro Cavalcanti G. & Gaglianone, Wagner P. & Guillén, Osmani Teixeira C. & Issler, João Victor & Lin, Yihao, 2021. "Machine learning and oil price point and density forecasting," Energy Economics, Elsevier, vol. 102(C).
    6. Götz, Thomas B. & Knetsch, Thomas A., 2019. "Google data in bridge equation models for German GDP," International Journal of Forecasting, Elsevier, vol. 35(1), pages 45-66.
    7. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    8. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    9. Mona Aghdaee & Bonny Parkinson & Kompal Sinha & Yuanyuan Gu & Rajan Sharma & Emma Olin & Henry Cutler, 2022. "An examination of machine learning to map non‐preference based patient reported outcome measures to health state utility values," Health Economics, John Wiley & Sons, Ltd., vol. 31(8), pages 1525-1557, August.
    10. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    11. Fan, Jianqing & Liao, Yuan, 2012. "Endogeneity in ultrahigh dimension," MPRA Paper 38698, University Library of Munich, Germany.
    12. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    13. Xu Cheng & Zhipeng Liao, 2012. "Select the Valid and Relevant Moments: A One-Step Procedure for GMM with Many Moments," PIER Working Paper Archive 12-045, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    14. Byron Botha & Rulof Burger & Kevin Kotzé & Neil Rankin & Daan Steenkamp, 2023. "Big data forecasting of South African inflation," Empirical Economics, Springer, vol. 65(1), pages 149-188, July.
    15. Belot, Michèle & James, Jonathan, 2016. "Partner selection into policy relevant field experiments," Journal of Economic Behavior & Organization, Elsevier, vol. 123(C), pages 31-56.
    16. Ivan Korolev, 2018. "LM-BIC Model Selection in Semiparametric Models," Papers 1811.10676, arXiv.org.
    17. Carroll, Kathryn A. & Samek, Anya, 2018. "Field experiments on food choice in grocery stores: A ‘how-to’ guide," Food Policy, Elsevier, vol. 79(C), pages 331-340.
    18. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    19. Michael C. Knaus & Michael Lechner & Anthony Strittmatter, 2022. "Heterogeneous Employment Effects of Job Search Programs: A Machine Learning Approach," Journal of Human Resources, University of Wisconsin Press, vol. 57(2), pages 597-636.
    20. Liesbeth Colen & Sergio Gomez y Paloma & Uwe Latacz-Lohmann & Marianne Lefebvre & Raphaële Préget & Sophie Thoyer, 2016. "Economic Experiments as a Tool for Agricultural Policy Evaluation: Insights from the European CAP," Canadian Journal of Agricultural Economics/Revue canadienne d'agroeconomie, Canadian Agricultural Economics Society/Societe canadienne d'agroeconomie, vol. 64(4), pages 667-694, December.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1610.05448. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.