IDEAS home Printed from https://ideas.repec.org/a/wly/envmet/v31y2020i4ne2614.html
   My bibliography  Save this article

Probabilistic predictive principal component analysis for spatially misaligned and high‐dimensional air pollution data with missing observations

Author

Listed:
  • Phuong T. Vu
  • Timothy V. Larson
  • Adam A. Szpiro

Abstract

Accurate predictions of pollutant concentrations at new locations are often of interest in air pollution studies on fine particulate matters (PM2.5), in which data are usually not measured at all study locations. PM2.5 is also a mixture of many different chemical components. Principal component analysis (PCA) can be incorporated to obtain lowerdimensional representative scores of such multipollutant data. Spatial prediction can then be used to estimate these scores at new locations. Recently developed predictive PCA modifies the traditional PCA algorithm to obtain scores with spatial structures that can be well predicted at unmeasured locations. However, these approaches require complete data, whereas multipollutant data tend to have complex missing patterns in practice. We propose probabilistic versions of predictive PCA, which allow for flexible model‐based imputation that can account for spatial information and subsequently improve the overall predictive performance.

Suggested Citation

  • Phuong T. Vu & Timothy V. Larson & Adam A. Szpiro, 2020. "Probabilistic predictive principal component analysis for spatially misaligned and high‐dimensional air pollution data with missing observations," Environmetrics, John Wiley & Sons, Ltd., vol. 31(4), June.
  • Handle: RePEc:wly:envmet:v:31:y:2020:i:4:n:e2614
    DOI: 10.1002/env.2614
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/env.2614
    Download Restriction: no

    File URL: https://libkey.io/10.1002/env.2614?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    2. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    3. Maitreyee Bose & Timothy Larson & Adam A. Szpiro, 2018. "Adaptive predictive principal components for modeling multivariate air pollution," Environmetrics, John Wiley & Sons, Ltd., vol. 29(8), December.
    4. Roman A. Jandarov & Lianne A. Sheppard & Paul D. Sampson & Adam A. Szpiro, 2017. "A novel principal component analysis for spatially misaligned multivariate air pollution data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(1), pages 3-28, January.
    5. Li, Gen & Yang, Dan & Nobel, Andrew B. & Shen, Haipeng, 2016. "Supervised singular value decomposition and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 7-17.
    6. J. Zhu & J. C. Eickhoff & P. Yan, 2005. "Generalized Linear Latent Variable Models for Repeated Measures of Spatially Correlated Multivariate Data," Biometrics, The International Biometric Society, vol. 61(3), pages 674-683, September.
    7. Hogan J.W. & Tchernis R., 2004. "Bayesian Factor Analysis for Spatially Correlated Data, With Application to Summarizing Area-Level Material Deprivation From Census Data," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 314-324, January.
    8. Francesca Dominici & Lianne Sheppard & Merlise Clyde, 2003. "Health Effects of Air Pollution: A Statistical Review," International Statistical Review, International Statistical Institute, vol. 71(2), pages 243-276, August.
    9. repec:ucp:bkecon:9780226316529 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Phuong T. Vu & Adam A. Szpiro & Noah Simon, 2022. "Spatial matrix completion for spatially misaligned and high‐dimensional air pollution data," Environmetrics, John Wiley & Sons, Ltd., vol. 33(4), June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Phuong T. Vu & Adam A. Szpiro & Noah Simon, 2022. "Spatial matrix completion for spatially misaligned and high‐dimensional air pollution data," Environmetrics, John Wiley & Sons, Ltd., vol. 33(4), June.
    2. Gen Li & Sungkyu Jung, 2017. "Incorporating covariates into integrated factor analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 73(4), pages 1433-1442, December.
    3. Wang, Zihan & Daeipour, Mohamad & Xu, Hongyi, 2023. "Quantification and propagation of Aleatoric uncertainties in topological structures," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    4. Ivaldi, Enrico, 2013. "Proposal of a country risk index based on a factorial analysis - Una proposta di indice di rischio paese basato sull’analisi fattoriale: una applicazione ai paesi del sud del Mediterraneo e ai paesi d," Economia Internazionale / International Economics, Camera di Commercio Industria Artigianato Agricoltura di Genova, vol. 66(2), pages 231-249.
    5. Kohei Adachi & Nickolay T. Trendafilov, 2016. "Sparse principal component analysis subject to prespecified cardinality of loadings," Computational Statistics, Springer, vol. 31(4), pages 1403-1427, December.
    6. Winifred U. Anake & Faith O. Bayode & Hassana O. Jonathan & Conrad A. Omonhinmin & Oluwole A. Odetunmibi & Timothy A. Anake, 2022. "Screening of Plant Species Response and Performance for Green Belt Development: Implications for Semi-Urban Ecosystem Restoration," Sustainability, MDPI, vol. 14(7), pages 1-14, March.
    7. Yixuan Qiu & Jing Lei & Kathryn Roeder, 2023. "Gradient-based sparse principal component analysis with extensions to online learning," Biometrika, Biometrika Trust, vol. 110(2), pages 339-360.
    8. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    9. Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
    10. Jushan Bai & Serena Ng, 2020. "Simpler Proofs for Approximate Factor Models of Large Dimensions," Papers 2008.00254, arXiv.org.
    11. Will Davis & Alexander Gordan & Rusty Tchernis, 2021. "Measuring the spatial distribution of health rankings in the United States," Health Economics, John Wiley & Sons, Ltd., vol. 30(11), pages 2921-2936, November.
    12. Sabel, Clive Eric & Wilson, Jeff Gaines & Kingham, Simon & Tisch, Catherine & Epton, Mike, 2007. "Spatial implications of covariate adjustment on patterns of risk: Respiratory hospital admissions in Christchurch, New Zealand," Social Science & Medicine, Elsevier, vol. 65(1), pages 43-59, July.
    13. Thomas Despois & Catherine Doz, 2022. "Identifying and interpreting the factors in factor models via sparsity : Different approaches," Working Papers halshs-03626503, HAL.
    14. Marta Santagata & Enrico Ivaldi & Riccardo Soliani, 2019. "Development and Governance in the Ex-Soviet Union: An Empirical Inquiry," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 141(1), pages 157-190, January.
    15. Matteo Barigozzi & Marc Hallin, 2023. "Dynamic Factor Models: a Genealogy," Papers 2310.17278, arXiv.org, revised Jan 2024.
    16. Thomas Despois & Catherine Doz, 2023. "Identifying and interpreting the factors in factor models via sparsity: Different approaches," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 533-555, June.
    17. Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
    18. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    19. Cook, R. Dennis, 2022. "A slice of multivariate dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    20. Jin-Xing Liu & Yong Xu & Chun-Hou Zheng & Yi Wang & Jing-Yu Yang, 2012. "Characteristic Gene Selection via Weighting Principal Components by Singular Values," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-10, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:envmet:v:31:y:2020:i:4:n:e2614. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.interscience.wiley.com/jpages/1180-4009/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.