IDEAS home Printed from https://ideas.repec.org/a/eee/intfor/v34y2018i2p339-354.html
   My bibliography  Save this article

Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods

Author

Listed:
  • Kim, Hyun Hak
  • Swanson, Norman R.

Abstract

A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using “big data” (see Bai and Ng, 2008; Dufour and Stevanovic, 2010; Forni, Hallin, Lippi, & Reichlin, 2000; Forni et al., 2005; Kim and Swanson, 2014a; Stock and Watson, 2002b, 2006, 2012, and the references cited therein). We add to this literature by analyzing whether “big data” are useful for modelling low frequency macroeconomic variables, such as unemployment, inflation and GDP. In particular, we analyze the predictive benefits associated with the use of principal component analysis (PCA), independent component analysis (ICA), and sparse principal component analysis (SPCA). We also evaluate machine learning, variable selection and shrinkage methods, including bagging, boosting, ridge regression, least angle regression, the elastic net, and the non-negative garotte. Our approach is to carry out a forecasting “horse-race” using prediction models that are constructed based on a variety of model specification approaches, factor estimation methods, and data windowing methods, in the context of predicting 11 macroeconomic variables that are relevant to monetary policy assessment. In many instances, we find that various of our benchmark models, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate specifications based on factor-type dimension reduction combined with various machine learning, variable selection, and shrinkage methods (called “combination” models). We find that forecast combination methods are mean square forecast error (MSFE) “best” for only three variables out of 11 for a forecast horizon of h=1, and for four variables when h=3 or 12. In addition, non-PCA type factor estimation methods yield MSFE-best predictions for nine variables out of 11 for h=1, although PCA dominates at longer horizons. Interestingly, we also find evidence of the usefulness of combination models for approximately half of our variables when h>1. Most importantly, we present strong new evidence of the usefulness of factor-based dimension reduction when utilizing “big data” for macroeconometric forecasting.

Suggested Citation

  • Kim, Hyun Hak & Swanson, Norman R., 2018. "Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods," International Journal of Forecasting, Elsevier, vol. 34(2), pages 339-354.
  • Handle: RePEc:eee:intfor:v:34:y:2018:i:2:p:339-354
    DOI: 10.1016/j.ijforecast.2016.02.012
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0169207016300668
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijforecast.2016.02.012?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Banerjee, Anindya & Marcellino, Massimiliano & Masten, Igor, 2014. "Forecasting with factor-augmented error correction models," International Journal of Forecasting, Elsevier, vol. 30(3), pages 589-612.
    2. Forni, Mario & Hallin, Marc & Lippi, Marco & Reichlin, Lucrezia, 2005. "The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 830-840, September.
    3. Clark, Todd E. & McCracken, Michael W., 2001. "Tests of equal forecast accuracy and encompassing for nested models," Journal of Econometrics, Elsevier, vol. 105(1), pages 85-110, November.
    4. McCracken, Michael W., 2004. "Parameter estimation and tests of equal forecast accuracy between non-nested models," International Journal of Forecasting, Elsevier, vol. 20(3), pages 503-514.
    5. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    6. James H. Stock & Mark W. Watson, 2005. "Implications of Dynamic Factor Models for VAR Analysis," NBER Working Papers 11467, National Bureau of Economic Research, Inc.
    7. Inoue, Atsushi & Kilian, Lutz, 2008. "How Useful Is Bagging in Forecasting Economic Time Series? A Case Study of U.S. Consumer Price Inflation," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 511-522, June.
    8. Connor, Gregory & Korajczyk, Robert A, 1993. "A Test for the Number of Factors in an Approximate Factor Model," Journal of Finance, American Finance Association, vol. 48(4), pages 1263-1291, September.
    9. Carmen Fernandez & Eduardo Ley & Mark F. J. Steel, 2001. "Model uncertainty in cross-country growth regressions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(5), pages 563-576.
    10. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    11. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    12. Stock, James H. & Watson, Mark W., 1999. "Forecasting inflation," Journal of Monetary Economics, Elsevier, vol. 44(2), pages 293-335, October.
    13. Nii Ayi Armah & Norman Swanson, 2010. "Seeing Inside the Black Box: Using Diffusion Index Methodology to Construct Factor Proxies in Large Scale Macroeconomic Time Series Environments," Econometric Reviews, Taylor & Francis Journals, vol. 29(5-6), pages 476-510.
    14. Alexei Onatski, 2009. "Testing Hypotheses About the Number of Factors in Large Factor Models," Econometrica, Econometric Society, vol. 77(5), pages 1447-1479, September.
    15. Bai, Jushan & Ng, Serena, 2008. "Forecasting economic time series using targeted predictors," Journal of Econometrics, Elsevier, vol. 146(2), pages 304-317, October.
    16. Stock, James H. & Watson, Mark W., 2006. "Forecasting with Many Predictors," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 1, chapter 10, pages 515-554, Elsevier.
    17. S. K. Vines, 2000. "Simple principal components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(4), pages 441-451.
    18. Clark, Todd E. & McCracken, Michael W., 2009. "Tests of Equal Predictive Ability With Real-Time Data," Journal of Business & Economic Statistics, American Statistical Association, vol. 27(4), pages 441-454.
    19. Timmermann, Allan, 2006. "Forecast Combinations," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 1, chapter 4, pages 135-196, Elsevier.
    20. Capistrán, Carlos & Timmermann, Allan, 2009. "Forecast Combination With Entry and Exit of Experts," Journal of Business & Economic Statistics, American Statistical Association, vol. 27(4), pages 428-440.
    21. Gary Koop & Simon Potter, 2004. "Forecasting in dynamic factor models using Bayesian model averaging," Econometrics Journal, Royal Economic Society, vol. 7(2), pages 550-565, December.
    22. Jean Boivin & Serena Ng, 2005. "Understanding and Comparing Factor-Based Forecasts," International Journal of Central Banking, International Journal of Central Banking, vol. 1(3), December.
    23. Ruey Yau, 2004. "Macroeconomic Forecasting with Independent Component Analysis," Econometric Society 2004 Far Eastern Meetings 741, Econometric Society.
    24. James H. Stock & Mark W. Watson, 2012. "Generalized Shrinkage Methods for Forecasting Using Many Predictors," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(4), pages 481-493, June.
    25. Chen, Yin-Ping & Huang, Hsin-Cheng & Tu, I-Ping, 2010. "A new approach for selecting the number of factors," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2990-2998, December.
    26. Jushan Bai & Serena Ng, 2006. "Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions," Econometrica, Econometric Society, vol. 74(4), pages 1133-1150, July.
    27. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    28. Connor, Gregory & Korajczyk, Robert A., 1986. "Performance measurement with the arbitrage pricing theory : A new framework for analysis," Journal of Financial Economics, Elsevier, vol. 15(3), pages 373-394, March.
    29. Alessio Moneta & Doris Entner & Patrik O. Hoyer & Alex Coad, 2013. "Causal Inference by Independent Component Analysis: Theory and Applications," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(5), pages 705-730, October.
    30. Clemen, Robert T., 1989. "Combining forecasts: A review and annotated bibliography," International Journal of Forecasting, Elsevier, vol. 5(4), pages 559-583.
    31. Kim, Hyun Hak & Swanson, Norman R., 2014. "Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence," Journal of Econometrics, Elsevier, vol. 178(P2), pages 352-367.
    32. Bai, Jushan & Ng, Serena, 2006. "Evaluating latent and observed factors in macroeconomics and finance," Journal of Econometrics, Elsevier, vol. 131(1-2), pages 507-537.
    33. Francis X. Diebold & Jose A. Lopez, 1995. "Forecast evaluation and combination," Research Paper 9525, Federal Reserve Bank of New York.
    34. Peres-Neto, Pedro R. & Jackson, Donald A. & Somers, Keith M., 2005. "How many principal components? stopping rules for determining the number of non-trivial axes revisited," Computational Statistics & Data Analysis, Elsevier, vol. 49(4), pages 974-997, June.
    35. Mc Cracken, Michael W., 2000. "Robust out-of-sample inference," Journal of Econometrics, Elsevier, vol. 99(2), pages 195-223, December.
    36. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    37. Chow, Gregory C & Lin, An-loh, 1971. "Best Linear Unbiased Interpolation, Distribution, and Extrapolation of Time Series by Related Series," The Review of Economics and Statistics, MIT Press, vol. 53(4), pages 372-375, November.
    38. McCracken, Michael W., 2007. "Asymptotics for out of sample tests of Granger causality," Journal of Econometrics, Elsevier, vol. 140(2), pages 719-752, October.
    39. Connor, Gregory & Korajczyk, Robert A., 1988. "Risk and return in an equilibrium APT : Application of a new test methodology," Journal of Financial Economics, Elsevier, vol. 21(2), pages 255-289, September.
    40. Ming Yuan & Yi Lin, 2007. "On the non‐negative garrotte estimator," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(2), pages 143-161, April.
    41. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    42. Jushan Bai & Serena Ng, 2009. "Boosting diffusion indices," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 24(4), pages 607-629.
    43. Aiolfi, Marco & Timmermann, Allan, 2006. "Persistence in forecasting performance and conditional combination strategies," Journal of Econometrics, Elsevier, vol. 135(1-2), pages 31-53.
    44. G. Elliott & C. Granger & A. Timmermann (ed.), 2006. "Handbook of Economic Forecasting," Handbook of Economic Forecasting, Elsevier, edition 1, volume 1, number 1.
    45. Mark W. Watson & James H. Stock, 2004. "Combination forecasts of output growth in a seven-country data set," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 23(6), pages 405-430.
    46. Josse, Julie & Husson, François, 2012. "Selecting the number of components in principal component analysis using cross-validation approximations," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1869-1879.
    47. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    48. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    49. Carvalho, Carlos M. & Chang, Jeffrey & Lucas, Joseph E. & Nevins, Joseph R. & Wang, Quanli & West, Mike, 2008. "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1438-1456.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hyun Hak Kim & Norman Swanson, 2013. "Mining Big Data Using Parsimonious Factor and Shrinkage Methods," Departmental Working Papers 201316, Rutgers University, Department of Economics.
    2. Kim, Hyun Hak & Swanson, Norman R., 2014. "Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence," Journal of Econometrics, Elsevier, vol. 178(P2), pages 352-367.
    3. Cheng, Xu & Hansen, Bruce E., 2015. "Forecasting with factor-augmented regression: A frequentist model averaging approach," Journal of Econometrics, Elsevier, vol. 186(2), pages 280-293.
    4. Tan, Xueping & Sirichand, Kavita & Vivian, Andrew & Wang, Xinyu, 2022. "Forecasting European carbon returns using dimension reduction techniques: Commodity versus financial fundamentals," International Journal of Forecasting, Elsevier, vol. 38(3), pages 944-969.
    5. Hyun Hak Kim, 2013. "Forecasting Macroeconomic Variables Using Data Dimension Reduction Methods: The Case of Korea," Working Papers 2013-26, Economic Research Institute, Bank of Korea.
    6. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    7. Kihwan Kim & Hyun Hak Kim & Norman R. Swanson, 2023. "Mixing mixed frequency and diffusion indices in good times and in bad: an assessment based on historical data around the great recession of 2008," Empirical Economics, Springer, vol. 64(3), pages 1421-1469, March.
    8. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," PSE Working Papers halshs-02262202, HAL.
    9. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers 2019-4, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    10. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers halshs-02262202, HAL.
    11. Xu Cheng & Bruce E. Hansen, 2012. "Forecasting with Factor-Augmented Regression: A Frequentist Model Averaging Approach, Second Version," PIER Working Paper Archive 13-061, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 03 Sep 2013.
    12. Karim Barhoumi & Olivier Darné & Laurent Ferrara, 2014. "Dynamic factor models: A review of the literature," OECD Journal: Journal of Business Cycle Measurement and Analysis, OECD Publishing, Centre for International Research on Economic Tendency Surveys, vol. 2013(2), pages 73-107.
    13. Nii Ayi Armah & Norman Swanson, 2010. "Seeing Inside the Black Box: Using Diffusion Index Methodology to Construct Factor Proxies in Large Scale Macroeconomic Time Series Environments," Econometric Reviews, Taylor & Francis Journals, vol. 29(5-6), pages 476-510.
    14. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    15. Ng, Serena, 2013. "Variable Selection in Predictive Regressions," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 752-789, Elsevier.
    16. Kihwan Kim & Norman Swanson, 2013. "Diffusion Index Model Specification and Estimation Using Mixed Frequency Datasets," Departmental Working Papers 201315, Rutgers University, Department of Economics.
    17. Norman R. Swanson & Weiqi Xiong & Xiye Yang, 2020. "Predicting interest rates using shrinkage methods, real‐time diffusion indexes, and model combinations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(5), pages 587-613, August.
    18. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    19. Karim Barhoumi & Olivier Darné & Laurent Ferrara, 2010. "Are disaggregate data useful for factor analysis in forecasting French GDP?," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 29(1-2), pages 132-144.
    20. Sandra Eickmeier & Christina Ziegler, 2008. "How successful are dynamic factor models at forecasting output and inflation? A meta-analytic approach," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 27(3), pages 237-265.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:34:y:2018:i:2:p:339-354. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.