IDEAS home Printed from
   My bibliography  Save this paper

Model Selection Techniques -- An Overview


  • Jie Ding
  • Vahid Tarokh
  • Yuhong Yang


In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.

Suggested Citation

  • Jie Ding & Vahid Tarokh & Yuhong Yang, 2018. "Model Selection Techniques -- An Overview," Papers 1810.09583,
  • Handle: RePEc:arx:papers:1810.09583

    Download full text from publisher

    File URL:
    File Function: Latest version
    Download Restriction: no

    References listed on IDEAS

    1. Wei Pan, 2001. "Akaike's Information Criterion in Generalized Estimating Equations," Biometrics, The International Biometric Society, vol. 57(1), pages 120-125, March.
    2. Yang Y., 2001. "Adaptive Regression by Mixing," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 574-588, June.
    3. Wenjing Yang & Yuhong Yang, 2017. "Toward an objective and reproducible model choice via variable selection deviation," Biometrics, The International Biometric Society, vol. 73(1), pages 20-30, March.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Kengne, William, 2021. "Strongly consistent model selection for general causal time series," Statistics & Probability Letters, Elsevier, vol. 171(C).
    2. Yonekura, Shouto & Beskos, Alexandros & Singh, Sumeetpal S., 2021. "Asymptotic analysis of model selection criteria for general hidden Markov models," Stochastic Processes and their Applications, Elsevier, vol. 132(C), pages 164-191.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qingfeng Liu & Qingsong Yao & Guoqing Zhao, 2020. "Model averaging estimation for conditional volatility models with an application to stock market volatility forecast," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(5), pages 841-863, August.
    2. Zimu Chen & Zhanfeng Wang & Yuan‐chin Ivan Chang, 2020. "Sequential adaptive variables and subject selection for GEE methods," Biometrics, The International Biometric Society, vol. 76(2), pages 496-507, June.
    3. Wei Pan, 2001. "Model Selection in Estimating Equations," Biometrics, The International Biometric Society, vol. 57(2), pages 529-534, June.
    4. Otto, Philipp E. & Bolle, Friedel, 2015. "Exploiting one’s power with a guilty conscience: An experimental investigation of self-serving biases," Journal of Economic Psychology, Elsevier, vol. 51(C), pages 79-89.
    5. Sara Evans-Lacko & Martin Knapp, 2014. "Importance of Social and Cultural Factors for Attitudes, Disclosure and Time off Work for Depression: Findings from a Seven Country European Study on Depression in the Workplace," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-10, March.
    6. Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
    7. Zhang, Shulin & Song, Peter X.-K. & Shi, Daimin & Zhou, Qian M., 2012. "Information ratio test for model misspecification on parametric structures in stochastic diffusion models," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 3975-3987.
    8. Zou, Hui & Yang, Yuhong, 2004. "Combining time series models for forecasting," International Journal of Forecasting, Elsevier, vol. 20(1), pages 69-84.
    9. Gregory N. Price & Juliet U. Elu, 2014. "Does regional currency integration ameliorate global macroeconomic shocks in sub-Saharan Africa? The case of the 2008-2009 global financial crisis," Journal of Economic Studies, Emerald Group Publishing, vol. 41(5), pages 737-750, September.
    10. James Cui, 2007. "QIC program and model selection in GEE analyses," Stata Journal, StataCorp LP, vol. 7(2), pages 209-220, June.
    11. Yongmiao Hong & Tae-Hwy Lee & Yuying Sun & Shouyang Wang & Xinyu Zhang, 2017. "Time-varying Model Averaging," Working Papers 202001, University of California at Riverside, Department of Economics.
    12. Goodwin, Paul & Önkal, Dilek & Stekler, Herman O., 2018. "What if you are not Bayesian? The consequences for decisions involving risk," European Journal of Operational Research, Elsevier, vol. 266(1), pages 238-246.
    13. Michael S. Rendall & Bonnie Ghosh-Dastidar & Margaret M. Weden & Zafar Nazarov, 2011. "Multiple Imputation for Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys," Working Papers WR-887-1, RAND Corporation.
    14. Kirca, Ahmet H. & Fernandez, Whitney Douglas & Kundu, Sumit K., 2016. "An empirical analysis and extension of internalization theory in emerging markets: The role of firm-specific assets and asset dispersion in the multinationality-performance relationship," Journal of World Business, Elsevier, vol. 51(4), pages 628-640.
    15. Thomas O. Meservy & Matthew L. Jensen & Kelly J. Fadel, 2014. "Evaluation of Competing Candidate Solutions in Electronic Networks of Practice," Information Systems Research, INFORMS, vol. 25(1), pages 15-34, March.
    16. Haili Zhang & Guohua Zou, 2020. "Cross-Validation Model Averaging for Generalized Functional Linear Model," Econometrics, MDPI, Open Access Journal, vol. 8(1), pages 1-35, February.
    17. Song Guo & Feng Ling & Juan Hou & Jinna Wang & Guiming Fu & Zhenyu Gong, 2014. "Mosquito Surveillance Revealed Lagged Effects of Mosquito Abundance on Mosquito-Borne Disease Transmission: A Retrospective Study in Zhejiang, China," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-8, November.
    18. Ghosh, D. & Yuan, Z., 2009. "An improved model averaging scheme for logistic regression," Journal of Multivariate Analysis, Elsevier, vol. 100(8), pages 1670-1681, September.
    19. Xinyue Gu & Bo Li, 2020. "Cross‐estimation for decision selection," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 36(5), pages 932-958, September.
    20. Marc-Andreas Muendler & Sascha O. Becker, 2010. "Margins of Multinational Labor Substitution," American Economic Review, American Economic Association, vol. 100(5), pages 1999-2030, December.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1810.09583. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (arXiv administrators). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.