IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v108y2013i504p1284-1294.html
   My bibliography  Save this article

Selecting the Number of Principal Components in Functional Data

Author

Listed:
  • Yehua Li
  • Naisyin Wang
  • Raymond J. Carroll

Abstract

Functional principal component analysis (FPCA) has become the most widely used dimension reduction tool for functional data analysis. We consider functional data measured at random, subject-specific time points, contaminated with measurement error, allowing for both sparse and dense functional data, and propose novel information criteria to select the number of principal component in such data. We propose a Bayesian information criterion based on marginal modeling that can consistently select the number of principal components for both sparse and dense functional data. For dense functional data, we also develop an Akaike information criterion based on the expected Kullback--Leibler information under a Gaussian assumption. In connecting with the time series literature, we also consider a class of information criteria proposed for factor analysis of multivariate time series and show that they are still consistent for dense functional data, if a prescribed undersmoothing scheme is undertaken in the FPCA algorithm. We perform intensive simulation studies and show that the proposed information criteria vastly outperform existing methods for this type of data. Surprisingly, our empirical evidence shows that our information criteria proposed for dense functional data also perform well for sparse functional data. An empirical example using colon carcinogenesis data is also provided to illustrate the results. Supplementary materials for this article are available online.

Suggested Citation

  • Yehua Li & Naisyin Wang & Raymond J. Carroll, 2013. "Selecting the Number of Principal Components in Functional Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1284-1294, December.
  • Handle: RePEc:taf:jnlasa:v:108:y:2013:i:504:p:1284-1294
    DOI: 10.1080/01621459.2013.788980
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2013.788980
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2013.788980?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Morris J. S. & Wang N. & Lupton J. R. & Chapkin R. S. & Turner N. D. & Young Hong M. & Carroll R. J., 2001. "Parametric and Nonparametric Methods for Understanding the Relationship Between Carcinogen-Induced DNA Adduct Levels in Distal and Proximal Regions of the Colon," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 816-826, September.
    2. Morris, Jeffrey S. & Vannucci, Marina & Brown, Philip J. & Carroll, Raymond J., 2003. "Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 573-583, January.
    3. Claeskens,Gerda & Hjort,Nils Lid, 2008. "Model Selection and Model Averaging," Cambridge Books, Cambridge University Press, number 9780521852258.
    4. Li, Yehua & Wang, Naisyin & Carroll, Raymond J., 2010. "Generalized Functional Linear Models With Semiparametric Single-Index Interactions," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 621-633.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Xin & Wang, Chong & Wu, Yichao, 2018. "Functional envelope for model-free sufficient dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 163(C), pages 37-50.
    2. Tingting Wang & Linjie Qin & Chao Dai & Zhen Wang & Chenqi Gong, 2023. "Heterogeneous Learning of Functional Clustering Regression and Application to Chinese Air Pollution Data," IJERPH, MDPI, vol. 20(5), pages 1-21, February.
    3. Saart, Patrick W. & Xia, Yingcun, 2022. "Functional time series approach to analyzing asset returns co-movements," Journal of Econometrics, Elsevier, vol. 229(1), pages 127-151.
    4. Li, Meng & Wang, Kehui & Maity, Arnab & Staicu, Ana-Maria, 2022. "Inference in functional linear quantile regression," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    5. Xiuli Du & Xiaohu Jiang & Jinguan Lin, 2023. "Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 975-1001, September.
    6. Huang, Lele & Zhao, Junlong & Wang, Huiwen & Wang, Siyang, 2016. "Robust shrinkage estimation and selection for functional multiple linear model through LAD loss," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 384-400.
    7. Sylvain Robbiano & Matthieu Saumard & Michel Curé, 2016. "Improving prediction performance of stellar parameters using functional models," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(8), pages 1465-1476, June.
    8. Li, Yehua & Qiu, Yumou & Xu, Yuhang, 2022. "From multivariate to functional data analysis: Fundamentals, recent developments, and emerging areas," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    9. Zhu, Hanbing & Zhang, Riquan & Yu, Zhou & Lian, Heng & Liu, Yanghui, 2019. "Estimation and testing for partially functional linear errors-in-variables models," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 296-314.
    10. Kyunghee Han & Pantelis Z Hadjipantelis & Jane-Ling Wang & Michael S Kramer & Seungmi Yang & Richard M Martin & Hans-Georg Müller, 2018. "Functional principal component analysis for identifying multivariate patterns and archetypes of growth, and their association with long-term cognitive development," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-18, November.
    11. Wong, Raymond K.W. & Zhang, Xiaoke, 2019. "Nonparametric operator-regularized covariance function estimation for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 131-144.
    12. Tengteng Xu & Riquan Zhang & Xiuzhen Zhang, 2023. "Estimation of spatial-functional based-line logit model for multivariate longitudinal data," Computational Statistics, Springer, vol. 38(1), pages 79-99, March.
    13. Joseph, Esdras & Galeano San Miguel, Pedro & Lillo Rodríguez, Rosa Elvira, 2015. "Two-sample Hotelling's T² statistics based on the functional Mahalanobis semi-distance," DES - Working Papers. Statistics and Econometrics. WS ws1503, Universidad Carlos III de Madrid. Departamento de Estadística.
    14. Yueying Wang & Guannan Wang & Li Wang & R. Todd Ogden, 2020. "Simultaneous confidence corridors for mean functions in functional data analysis of imaging data," Biometrics, The International Biometric Society, vol. 76(2), pages 427-437, June.
    15. Ma, Haiqiang & Zhu, Zhongyi, 2016. "Continuously dynamic additive models for functional data," Journal of Multivariate Analysis, Elsevier, vol. 150(C), pages 1-13.
    16. Xinyue Chang & Yehua Li & Yi Li, 2023. "Asynchronous and error‐prone longitudinal data analysis via functional calibration," Biometrics, The International Biometric Society, vol. 79(4), pages 3374-3387, December.
    17. Philip T. Reiss & Jeff Goldsmith & Han Lin Shang & R. Todd Ogden, 2017. "Methods for Scalar-on-Function Regression," International Statistical Review, International Statistical Institute, vol. 85(2), pages 228-249, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kitagawa, Toru & Muris, Chris, 2016. "Model averaging in semiparametric estimation of treatment effects," Journal of Econometrics, Elsevier, vol. 193(1), pages 271-289.
    2. Philippe Goulet Coulombe & Maxime Leroux & Dalibor Stevanovic & Stéphane Surprenant, 2022. "How is machine learning useful for macroeconomic forecasting?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 920-964, August.
    3. Davide Fiaschi & Andrea Mario Lavezzi & Angela Parenti, 2020. "Deep and Proximate Determinants of the World Income Distribution," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 66(3), pages 677-710, September.
    4. Fabio Canova & Christian Matthes, 2021. "Dealing with misspecification in structural macroeconometric models," Quantitative Economics, Econometric Society, vol. 12(2), pages 313-350, May.
    5. Yu, Jun & Meng, Xiran & Wang, Yaping, 2023. "Optimal designs for semi-parametric dose-response models under random contamination," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
    6. Zhang, Tao & Zhang, Qingzhao & Wang, Qihua, 2014. "Model detection for functional polynomial regression," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 183-197.
    7. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    8. HAEDO, Christian & MOUCHART , Michel & ,, 2013. "Specialized agglomerations with areal data: model and detection," LIDAM Discussion Papers CORE 2013060, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    9. Bhattacharya, Debopam & Dupas, Pascaline, 2012. "Inferring welfare maximizing treatment assignment under budget constraints," Journal of Econometrics, Elsevier, vol. 167(1), pages 168-196.
    10. Schomaker Michael & Heumann Christian, 2011. "Model Averaging in Factor Analysis: An Analysis of Olympic Decathlon Data," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(1), pages 1-15, January.
    11. Dirick, Lore & Claeskens, Gerda & Vasnev, Andrey & Baesens, Bart, 2022. "A hierarchical mixture cure model with unobserved heterogeneity for credit risk," Econometrics and Statistics, Elsevier, vol. 22(C), pages 39-55.
    12. Tumala, Mohammed M & Olubusoye, Olusanya E & Yaaba, Baba N & Yaya, OlaOluwa S & Akanbi, Olawale B, 2017. "Forecasting Nigerian Inflation using Model Averaging methods: Modelling Frameworks to Central Banks," MPRA Paper 88754, University Library of Munich, Germany, revised Feb 2018.
    13. José Manuel Cordero Ferrera & Manuel Muñiz Pérez & Rosa Simancas Rodríguez, 2015. "The influence of socioeconomic factors on cognitive and non-cognitive educational outcomes," Investigaciones de Economía de la Educación volume 10, in: Marta Rahona López & Jennifer Graves (ed.), Investigaciones de Economía de la Educación 10, edition 1, volume 10, chapter 21, pages 413-438, Asociación de Economía de la Educación.
    14. Satoshi Hattori & Masayuki Henmi, 2014. "Stratified doubly robust estimators for the average causal effect," Biometrics, The International Biometric Society, vol. 70(2), pages 270-277, June.
    15. Naoya Sueishi & Arihiro Yoshimura, 2017. "Focused Information Criterion for Series Estimation in Partially Linear Models," The Japanese Economic Review, Japanese Economic Association, vol. 68(3), pages 352-363, September.
    16. Liao, Jun & Zou, Guohua, 2020. "Corrected Mallows criterion for model averaging," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    17. Tune H Pers & Anders Albrechtsen & Claus Holst & Thorkild I A Sørensen & Thomas A Gerds, 2009. "The Validation and Assessment of Machine Learning: A Game of Prediction from High-Dimensional Data," PLOS ONE, Public Library of Science, vol. 4(8), pages 1-8, August.
    18. Miao Han & Liuquan Sun & Yutao Liu & Jun Zhu, 2018. "Joint analysis of recurrent event data with additive–multiplicative hazards model for the terminal event time," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 81(5), pages 523-547, July.
    19. Robert J. B. Goudie & Sach Mukherjee & Jan-Emmanuel De Neve & Andrew J. Oswald & Stephen Wu, 2011. "Happiness as a Driver of Risk-Avoiding Behavior," CESifo Working Paper Series 3451, CESifo.
    20. Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:108:y:2013:i:504:p:1284-1294. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.