IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v187y2023ics0167947323001330.html
   My bibliography  Save this article

Online missing value imputation for high-dimensional mixed-type data via generalized factor models

Author

Listed:
  • Liu, Wei
  • Luo, Lan
  • Zhou, Ling

Abstract

The complete-observation requirement of most machine learning methods necessitates new statistical methods to handle datasets messy with missing values. This is especially urgent for streaming data that are generated at high speed and with a lack of quality control. Missing data imputation becomes an inevitable preprocessing step before subsequent analysis. A practical and meaningful online imputation algorithm should be not only scalable to large-scale datasets but also able to manage high-dimensional mixed-type data containing binary, count and continuous variables. To fill this gap, a novel online imputation algorithm, called OMIG, is proposed for streaming data under the framework of generalized factor models. To obtain deeper insight, OMIG is theoretically and empirically compared to its other two versions, the oracle version and the offline version. Theoretical and numerical findings show that (a) the imputed data obtained by OMIG are not equivalent to but instead at a slower rate than those obtained by its oracle version in terms of imputation accuracy; (b) OMIG outperforms its offline version in imputation accuracy; and (c) OMIG is equivalent to its oracle version in estimation accuracy for the factor loading, which largely facilitates interpretation and follow-up analysis. Extensive numerical experiments and two real datasets are used to demonstrate the performance of the proposed method.

Suggested Citation

  • Liu, Wei & Luo, Lan & Zhou, Ling, 2023. "Online missing value imputation for high-dimensional mixed-type data via generalized factor models," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
  • Handle: RePEc:eee:csdana:v:187:y:2023:i:c:s0167947323001330
    DOI: 10.1016/j.csda.2023.107822
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947323001330
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2023.107822?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    2. Lan Luo & Peter X.‐K. Song, 2020. "Renewable estimation and incremental inference in generalized linear models with streaming data sets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(1), pages 69-97, February.
    3. Jin, Sainan & Miao, Ke & Su, Liangjun, 2021. "On factor models with random missing: EM estimation, inference, and cross validation," Journal of Econometrics, Elsevier, vol. 222(1), pages 745-777.
    4. Bai, Jushan & Ng, Serena, 2013. "Principal components estimation and identification of static factors," Journal of Econometrics, Elsevier, vol. 176(1), pages 18-29.
    5. Ruoxuan Xiong & Markus Pelger, 2019. "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference," Papers 1910.08273, arXiv.org, revised Jan 2022.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
    2. Yinchu Zhu, 2019. "How well can we learn large factor models without assuming strong factors?," Papers 1910.10382, arXiv.org, revised Nov 2019.
    3. Jianqing Fan & Kunpeng Li & Yuan Liao, 2020. "Recent Developments on Factor Models and its Applications in Econometric Learning," Papers 2009.10103, arXiv.org.
    4. Claudio Morana, 2014. "Factor Vector Autoregressive Estimation of Heteroskedastic Persistent and Non Persistent Processes Subject to Structural Breaks," Working Papers 273, University of Milano-Bicocca, Department of Economics, revised May 2014.
    5. Wei, Jie & Chen, Hui, 2020. "Determining the number of factors in approximate factor models by twice K-fold cross validation," Economics Letters, Elsevier, vol. 191(C).
    6. Luke Hartigan & James Morley, 2020. "A Factor Model Analysis of the Australian Economy and the Effects of Inflation Targeting," The Economic Record, The Economic Society of Australia, vol. 96(314), pages 271-293, September.
    7. Jushan Bai & Serena Ng, 2020. "Simpler Proofs for Approximate Factor Models of Large Dimensions," Papers 2008.00254, arXiv.org.
    8. Thomas Despois & Catherine Doz, 2022. "Identifying and interpreting the factors in factor models via sparsity : Different approaches," Working Papers halshs-03626503, HAL.
    9. Thomas Despois & Catherine Doz, 2023. "Identifying and interpreting the factors in factor models via sparsity: Different approaches," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 533-555, June.
    10. Jiahe Lin & George Michailidis, 2019. "Approximate Factor Models with Strongly Correlated Idiosyncratic Errors," Papers 1912.04123, arXiv.org.
    11. Yunus Emre Ergemen & Carlos Vladimir Rodríguez-Caballero, 2016. "A Dynamic Multi-Level Factor Model with Long-Range Dependence," CREATES Research Papers 2016-23, Department of Economics and Business Economics, Aarhus University.
    12. Ruoxuan Xiong & Markus Pelger, 2019. "Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference," Papers 1910.08273, arXiv.org, revised Jan 2022.
    13. Aleksandra Halka & Grzegorz Szafranski, 2018. "What Common Factors are Driving Inflation in CEE Countries?," Prague Economic Papers, Prague University of Economics and Business, vol. 2018(2), pages 131-148.
    14. Mao Takongmo, Charles Olivier & Stevanovic, Dalibor, 2015. "Selection Of The Number Of Factors In Presence Of Structural Instability: A Monte Carlo Study," L'Actualité Economique, Société Canadienne de Science Economique, vol. 91(1-2), pages 177-233, Mars-Juin.
    15. Tomohiro Ando & Matthew Greenwood-Nimmo & Yongcheol Shin, 2022. "Quantile Connectedness: Modeling Tail Behavior in the Topology of Financial Networks," Management Science, INFORMS, vol. 68(4), pages 2401-2431, April.
    16. Jushan Bai & Kunpeng Li & Lina Lu, 2016. "Estimation and Inference of FAVAR Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 620-641, October.
    17. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    18. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    19. Antoine A. Djogbenou, 2020. "Comovements in the real activity of developed and emerging economies: A test of global versus specific international factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(3), pages 344-370, April.
    20. Hevia, Constantino & Servén, Luis, 2018. "Assessing the degree of international consumption risk sharing," Journal of Development Economics, Elsevier, vol. 134(C), pages 176-190.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:187:y:2023:i:c:s0167947323001330. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.