IDEAS home Printed from https://ideas.repec.org/a/eee/econom/v222y2021i1p745-777.html
   My bibliography  Save this article

On factor models with random missing: EM estimation, inference, and cross validation

Author

Listed:
  • Jin, Sainan
  • Miao, Ke
  • Su, Liangjun

Abstract

We consider the estimation and inference in approximate factor models with random missing values. We show that with the low rank structure of the common component, we can estimate the factors and factor loadings consistently with the missing values replaced by zeros. We establish the asymptotic distributions of the resulting estimators and those based on the EM algorithm. We also propose a cross-validation-based method to determine the number of factors in factor models with or without missing values and justify its consistency. Simulations demonstrate that our cross validation method is robust to fat tails in the error distribution and significantly outperforms some existing popular methods in terms of correct percentage in determining the number of factors. An application to the factor-augmented regression models shows that a proper treatment of the missing values can improve the out-of-sample forecast of some macroeconomic variables.

Suggested Citation

  • Jin, Sainan & Miao, Ke & Su, Liangjun, 2021. "On factor models with random missing: EM estimation, inference, and cross validation," Journal of Econometrics, Elsevier, vol. 222(1), pages 745-777.
  • Handle: RePEc:eee:econom:v:222:y:2021:i:1:p:745-777
    DOI: 10.1016/j.jeconom.2020.08.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304407620302815
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jeconom.2020.08.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Claudia Foroni & Massimiliano Marcellino, 2013. "A survey of econometric methods for mixed-frequency data," Working Paper 2013/06, Norges Bank.
    2. Giannone, Domenico & Reichlin, Lucrezia & Small, David, 2008. "Nowcasting: The real-time informational content of macroeconomic data," Journal of Monetary Economics, Elsevier, vol. 55(4), pages 665-676, May.
    3. Forni, Mario & Lippi, Marco, 2001. "The Generalized Dynamic Factor Model: Representation Theory," Econometric Theory, Cambridge University Press, vol. 17(6), pages 1113-1141, December.
    4. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    5. Lu, Xun & Su, Liangjun, 2016. "Shrinkage estimation of dynamic panel data models with interactive fixed effects," Journal of Econometrics, Elsevier, vol. 190(1), pages 148-175.
    6. Doz, Catherine & Giannone, Domenico & Reichlin, Lucrezia, 2011. "A two-step estimator for large approximate dynamic factor models based on Kalman filtering," Journal of Econometrics, Elsevier, vol. 164(1), pages 188-205, September.
    7. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    8. Su, Liangjun & Chen, Qihui, 2013. "Testing Homogeneity In Panel Data Models With Interactive Fixed Effects," Econometric Theory, Cambridge University Press, vol. 29(6), pages 1079-1135, December.
    9. G. Mesters & S. J. Koopman & M. Ooms, 2016. "Monte Carlo Maximum Likelihood Estimation for Generalized Long-Memory Time Series Models," Econometric Reviews, Taylor & Francis Journals, vol. 35(4), pages 659-687, April.
    10. Michael W. McCracken & Serena Ng, 2016. "FRED-MD: A Monthly Database for Macroeconomic Research," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 574-589, October.
    11. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    12. Moon, Hyungsik Roger & Weidner, Martin, 2017. "Dynamic Linear Panel Regression Models With Interactive Fixed Effects," Econometric Theory, Cambridge University Press, vol. 33(1), pages 158-195, February.
    13. repec:hal:journl:peer-00844811 is not listed on IDEAS
    14. Marcellino, Massimiliano & Sivec, Vasja, 2016. "Monetary, fiscal and oil shocks: Evidence based on mixed frequency structural FAVARs," Journal of Econometrics, Elsevier, vol. 193(2), pages 335-348.
    15. Seung C. Ahn & Alex R. Horenstein, 2013. "Eigenvalue Ratio Test for the Number of Factors," Econometrica, Econometric Society, vol. 81(3), pages 1203-1227, May.
    16. Alexei Onatski, 2009. "Testing Hypotheses About the Number of Factors in Large Factor Models," Econometrica, Econometric Society, vol. 77(5), pages 1447-1479, September.
    17. Maximiano Pinheiro & António Rua & Francisco Dias, 2013. "Dynamic Factor Models with Jagged Edge Panel Data: Taking on Board the Dynamics of the Idiosyncratic Components," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 75(1), pages 80-102, February.
    18. Ludvigson, Sydney C. & Ng, Serena, 2007. "The empirical risk-return relation: A factor analysis approach," Journal of Financial Economics, Elsevier, vol. 83(1), pages 171-222, January.
    19. Susan Athey & Mohsen Bayati & Nikolay Doudchenko & Guido Imbens & Khashayar Khosravi, 2021. "Matrix Completion Methods for Causal Panel Data Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1716-1730, October.
    20. Domenico Giannone & Lucrezia Reichlin & David Small, 2008. "Nowcasting: the real time informational content of macroeconomic data releases," ULB Institutional Repository 2013/6409, ULB -- Universite Libre de Bruxelles.
    21. Jushan Bai & Kunpeng Li, 2016. "Maximum Likelihood Estimation and Inference for Approximate Factor Models of High Dimension," The Review of Economics and Statistics, MIT Press, vol. 98(2), pages 298-309, May.
    22. Roberto S. Mariano & Yasutomo Murasawa, 2010. "A Coincident Index, Common Factors, and Monthly Real GDP," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 72(1), pages 27-46, February.
    23. Domenico Giannone & Lucrezia Reichlin & David H. Small, 2005. "Nowcasting GDP and inflation: the real-time informational content of macroeconomic data releases," Finance and Economics Discussion Series 2005-42, Board of Governors of the Federal Reserve System (U.S.).
    24. Jungbacker, B. & Koopman, S.J. & van der Wel, M., 2011. "Maximum likelihood estimation for dynamic factor models with missing data," Journal of Economic Dynamics and Control, Elsevier, vol. 35(8), pages 1358-1368, August.
    25. Hallin, Marc & Liska, Roman, 2007. "Determining the Number of Factors in the General Dynamic Factor Model," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 603-617, June.
    26. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    27. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    28. Su, Liangjun & Jin, Sainan & Zhang, Yonghui, 2015. "Specification test for panel data models with interactive fixed effects," Journal of Econometrics, Elsevier, vol. 186(1), pages 222-244.
    29. Bai, Jushan & Ng, Serena, 2019. "Rank regularized estimation of approximate factor models," Journal of Econometrics, Elsevier, vol. 212(1), pages 78-96.
    30. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    31. Schumacher, Christian & Breitung, Jörg, 2008. "Real-time forecasting of German GDP based on a large factor model with monthly and quarterly data," International Journal of Forecasting, Elsevier, vol. 24(3), pages 386-398.
    32. Thomas J. Sargent & Christopher A. Sims, 1977. "Business cycle modeling without pretending to have too much a priori economic theory," Working Papers 55, Federal Reserve Bank of Minneapolis.
    33. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    34. Alexei Onatski, 2010. "Determining the Number of Factors from Empirical Distribution of Eigenvalues," The Review of Economics and Statistics, MIT Press, vol. 92(4), pages 1004-1016, November.
    35. Su, Liangjun & Wang, Xia, 2017. "On time-varying factor models: Estimation and testing," Journal of Econometrics, Elsevier, vol. 198(1), pages 84-101.
    36. Marta Bańbura & Michele Modugno, 2014. "Maximum Likelihood Estimation Of Factor Models On Datasets With Arbitrary Pattern Of Missing Data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 29(1), pages 133-160, January.
    37. Onatski, Alexei, 2012. "Asymptotics of the principal components estimator of large factor models with weakly influential factors," Journal of Econometrics, Elsevier, vol. 168(2), pages 244-258.
    38. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chaohua Dong & Jiti Gao & Oliver Linton & Bin peng, 2020. "On Time Trend of COVID-19: A Panel Data Study," Monash Econometrics and Business Statistics Working Papers 22/20, Monash University, Department of Econometrics and Business Statistics.
    2. Wei, Jie & Chen, Hui, 2020. "Determining the number of factors in approximate factor models by twice K-fold cross validation," Economics Letters, Elsevier, vol. 191(C).
    3. Artūras Juodis & Simas Kučinskas, 2023. "Quantifying noise in survey expectations," Quantitative Economics, Econometric Society, vol. 14(2), pages 609-650, May.
    4. Liddle, Brantley & Hasanov, Fakhri J. & Parker, Steven, 2022. "Your mileage may vary: Have road-fuel demand elasticities changed over time in middle-income countries?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 165(C), pages 38-53.
    5. Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
    6. Jushan Bai & Serena Ng, 2021. "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1746-1763, October.
    7. Camacho, Maximo & Lopez-Buenache, German, 2023. "Factor models for large and incomplete data sets with unknown group structure," International Journal of Forecasting, Elsevier, vol. 39(3), pages 1205-1220.
    8. Liu, Wei & Luo, Lan & Zhou, Ling, 2023. "Online missing value imputation for high-dimensional mixed-type data via generalized factor models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    9. Ruoxuan Xiong & Markus Pelger, 2019. "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference," Papers 1910.08273, arXiv.org, revised Jan 2022.
    10. Yinchu Zhu, 2019. "How well can we learn large factor models without assuming strong factors?," Papers 1910.10382, arXiv.org, revised Nov 2019.
    11. Jungjun Choi & Hyukjun Kwon & Yuan Liao, 2023. "Inference for Low-rank Completion without Sample Splitting with Application to Treatment Effect Estimation," Papers 2307.16370, arXiv.org.
    12. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    13. Serena Ng & Susannah Scanlan, 2023. "Constructing High Frequency Economic Indicators by Imputation," Papers 2303.01863, arXiv.org, revised Oct 2023.
    14. Jiti Gao & Oliver Linton & Bin Peng, 2022. "A Nonparametric Panel Model for Climate Data with Seasonal and Spatial Variation," Monash Econometrics and Business Statistics Working Papers 9/22, Monash University, Department of Econometrics and Business Statistics.
    15. Victor Chernozhukov & Christian Hansen & Yuan Liao & Yinchu Zhu, 2021. "Inference for Low-Rank Models," Papers 2107.02602, arXiv.org, revised Jan 2023.
    16. Junting Duan & Markus Pelger & Ruoxuan Xiong, 2023. "Target PCA: Transfer Learning Large Dimensional Panel Data," Papers 2308.15627, arXiv.org.
    17. Jungjun Choi & Ming Yuan, 2023. "Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models," Papers 2308.02364, arXiv.org.
    18. Xiong, Ruoxuan & Pelger, Markus, 2023. "Large dimensional latent factor modeling with missing observations and applications to causal inference," Journal of Econometrics, Elsevier, vol. 233(1), pages 271-301.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    2. Poncela, Pilar & Ruiz, Esther & Miranda, Karen, 2021. "Factor extraction using Kalman filter and smoothing: This is not just another survey," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1399-1425.
    3. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," PSE Working Papers halshs-02262202, HAL.
    4. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers halshs-02262202, HAL.
    5. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    6. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers 2019-4, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    7. Pilar Poncela & Esther Ruiz, 2016. "Small- Versus Big-Data Factor Extraction in Dynamic Factor Models: An Empirical Assessment," Advances in Econometrics, in: Dynamic Factor Models, volume 35, pages 401-434, Emerald Group Publishing Limited.
    8. Matteo Barigozzi, 2023. "Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models: A Critical Review," Papers 2303.11777, arXiv.org, revised Dec 2023.
    9. Karim Barhoumi & Olivier Darné & Laurent Ferrara, 2014. "Dynamic factor models: A review of the literature," OECD Journal: Journal of Business Cycle Measurement and Analysis, OECD Publishing, Centre for International Research on Economic Tendency Surveys, vol. 2013(2), pages 73-107.
    10. Forni, Mario & Cavicchioli, Maddalena & Lippi, Marco & Zaffaroni, Paolo, 2016. "Eigenvalue Ratio Estimators for the Number of Common Factors," CEPR Discussion Papers 11440, C.E.P.R. Discussion Papers.
    11. Matteo Barigozzi & Matteo Luciani, 2019. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Papers 1910.03821, arXiv.org, revised Feb 2022.
    12. Bai, Jushan & Liao, Yuan, 2016. "Efficient estimation of approximate factor models via penalized maximum likelihood," Journal of Econometrics, Elsevier, vol. 191(1), pages 1-18.
    13. Forni, Mario & Hallin, Marc & Lippi, Marco & Zaffaroni, Paolo, 2015. "Dynamic factor models with infinite-dimensional factor spaces: One-sided representations," Journal of Econometrics, Elsevier, vol. 185(2), pages 359-371.
    14. Xiong, Ruoxuan & Pelger, Markus, 2023. "Large dimensional latent factor modeling with missing observations and applications to causal inference," Journal of Econometrics, Elsevier, vol. 233(1), pages 271-301.
    15. Martin Solberger & Erik Spånberg, 2020. "Estimating a Dynamic Factor Model in EViews Using the Kalman Filter and Smoother," Computational Economics, Springer;Society for Computational Economics, vol. 55(3), pages 875-900, March.
    16. Matteo Barigozzi & Marc Hallin, 2023. "Dynamic Factor Models: a Genealogy," Papers 2310.17278, arXiv.org, revised Jan 2024.
    17. Hindrayanto, Irma & Koopman, Siem Jan & de Winter, Jasper, 2016. "Forecasting and nowcasting economic growth in the euro area using factor models," International Journal of Forecasting, Elsevier, vol. 32(4), pages 1284-1305.
    18. Miao, Ke & Phillips, Peter C.B. & Su, Liangjun, 2023. "High-dimensional VARs with common factors," Journal of Econometrics, Elsevier, vol. 233(1), pages 155-183.
    19. Kihwan Kim & Norman Swanson, 2013. "Diffusion Index Model Specification and Estimation Using Mixed Frequency Datasets," Departmental Working Papers 201315, Rutgers University, Department of Economics.
    20. Jonas Krampe & Luca Margaritella, 2021. "Factor Models with Sparse VAR Idiosyncratic Components," Papers 2112.07149, arXiv.org, revised May 2022.

    More about this item

    Keywords

    Cross-validation; Expectation–Maximization (EM) algorithm; Factor models; Matrix completion; Missing at random; Principal component analysis; Singular value decomposition;
    All these keywords.

    JEL classification:

    • C23 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Models with Panel Data; Spatio-temporal Models
    • C33 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Models with Panel Data; Spatio-temporal Models
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:econom:v:222:y:2021:i:1:p:745-777. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jeconom .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.