IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0315574.html
   My bibliography  Save this article

Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size

Author

Listed:
  • Valter Cesar de Souza
  • Sergio Augusto Rodrigues
  • Luís Roberto Almeida Gabriel Filho

Abstract

Meteorological data acquired with precision, quality, and reliability are crucial in various agronomy fields, especially in studies related to reference evapotranspiration (ETo). ETo plays a fundamental role in the hydrological cycle, irrigation system planning and management, water demand modeling, water stress monitoring, water balance estimation, as well as in hydrological and environmental studies. However, temporal records often encounter issues such as missing measurements. The aim of this study was to evaluate the performance of alternative multivariate procedures for principal component analysis (PCA), using the Nonlinear Iterative Partial Least Squares (NIPALS) and Expectation-Maximization (EM) algorithms, for imputing missing data in time series of meteorological variables. This was carried out on high-dimensional and reduced-sample databases, covering different percentages of missing data. The databases, collected between 2011 and 2021, originated from 45 automatic weather stations in the São Paulo region, Brazil. They were used to create a daily time series of ETo. Five scenarios of missing data (10%, 20%, 30%, 40%, 50%) were simulated, in which datasets were randomly withdrawn from the ETo base. Subsequently, imputation was performed using the NIPALS-PCA, EM-PCA, and simple mean imputation (IM) procedures. This cycle was repeated 100 times, and average performance indicators were calculated. Statistical performance evaluation utilized the following indicators: correlation coefficient (r), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE), Normalized Root Mean Square Error (nRMSE), Willmott Index (d), and performance index (c). In the scenario with 10% missing data, NIPALS-PCA achieved the lowest MAPE (15.4%), followed by EM-PCA (17.0%), while IM recorded a MAPE of 24.7%. In the scenario with 50% missing data, there was a performance reversal, with EM-PCA showing the lowest MAPE (19.1%), followed by NIPALS-PCA (19.9%). The NIPALS-PCA and EM-PCA approaches demonstrated good results in imputation (10% ≤ nRMSE

Suggested Citation

  • Valter Cesar de Souza & Sergio Augusto Rodrigues & Luís Roberto Almeida Gabriel Filho, 2024. "Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-20, December.
  • Handle: RePEc:plo:pone00:0315574
    DOI: 10.1371/journal.pone.0315574
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0315574
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0315574&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0315574?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Henk Kiers, 1997. "Weighted least squares fitting using ordinary least squares algorithms," Psychometrika, Springer;The Psychometric Society, vol. 62(2), pages 251-266, June.
    2. Sentelhas, Paulo C. & Gillespie, Terry J. & Santos, Eduardo A., 2010. "Evaluation of FAO Penman-Monteith and alternative methods for estimating reference evapotranspiration with missing data in Southern Ontario, Canada," Agricultural Water Management, Elsevier, vol. 97(5), pages 635-644, May.
    3. Mhawej, Mario & Caiserman, Arnaud & Nasrallah, Ali & Dawi, Ali & Bachour, Roula & Faour, Ghaleb, 2020. "Automated evapotranspiration retrieval model with missing soil-related datasets: The proposal of SEBALI," Agricultural Water Management, Elsevier, vol. 229(C).
    4. Julie Josse & Jérôme Pagès & François Husson, 2011. "Multiple imputation in principal component analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(3), pages 231-246, October.
    5. Allen, Richard G. & Pereira, Luis S. & Howell, Terry A. & Jensen, Marvin E., 2011. "Evapotranspiration information reporting: I. Factors governing measurement accuracy," Agricultural Water Management, Elsevier, vol. 98(6), pages 899-920, April.
    6. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shih-Lun Fang & Yi-Shan Lin & Sheng-Chih Chang & Yi-Lung Chang & Bing-Yun Tsai & Bo-Jein Kuo, 2024. "Using Artificial Intelligence Algorithms to Estimate and Short-Term Forecast the Daily Reference Evapotranspiration with Limited Meteorological Variables," Agriculture, MDPI, vol. 14(4), pages 1-20, March.
    2. Martí, Pau & López-Urrea, Ramón & Mancha, Luis A. & González-Altozano, Pablo & Román, Armand, 2024. "Seasonal assessment of the grass reference evapotranspiration estimation from limited inputs using different calibrating time windows and lysimeter benchmarks," Agricultural Water Management, Elsevier, vol. 300(C).
    3. Paredes, P. & Pereira, L.S. & Almorox, J. & Darouich, H., 2020. "Reference grass evapotranspiration with reduced data sets: Parameterization of the FAO Penman-Monteith temperature approach and the Hargeaves-Samani equation using local climatic variables," Agricultural Water Management, Elsevier, vol. 240(C).
    4. Mhawej, Mario & Elias, Georgie & Nasrallah, Ali & Faour, Ghaleb, 2020. "Dynamic calibration for better SEBALI ET estimations: Validations and recommendations," Agricultural Water Management, Elsevier, vol. 230(C).
    5. Mekonnen, Yilkal Gebeyehu & Alamirew, Tena & Malede, Demelash Ademe & Pareeth, Sajid & Bantider, Amare & Chukalla, Abebe Demissie, 2024. "Tailoring the surface energy balance algorithm for land-improved (SEBALI) model using high-resolution land/use land cover for monitoring actual evapotranspiration," Agricultural Water Management, Elsevier, vol. 303(C).
    6. Josse, Julie & Husson, François, 2012. "Selecting the number of components in principal component analysis using cross-validation approximations," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1869-1879.
    7. Traore, Seydou & Luo, Yufeng & Fipps, Guy, 2016. "Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages," Agricultural Water Management, Elsevier, vol. 163(C), pages 363-379.
    8. Paredes, Paula & Martins, Diogo S. & Pereira, Luis Santos & Cadima, Jorge & Pires, Carlos, 2018. "Accuracy of daily estimation of grass reference evapotranspiration using ERA-Interim reanalysis products with assessment of alternative bias correction schemes," Agricultural Water Management, Elsevier, vol. 210(C), pages 340-353.
    9. Feng, Yu & Gong, Daozhi & Mei, Xurong & Hao, Weiping & Tang, Dahua & Cui, Ningbo, 2017. "Energy balance and partitioning in partial plastic mulched and non-mulched maize fields on the Loess Plateau of China," Agricultural Water Management, Elsevier, vol. 191(C), pages 193-206.
    10. Jan Kluge & Sarah Lappöhn & Kerstin Plank, 2023. "Predictors of TFP growth in European countries," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 50(1), pages 109-140, February.
    11. Qing Li & Long Hai Vo, 2021. "Intangible Capital and Innovation: An Empirical Analysis of Vietnamese Enterprises," Economics Discussion / Working Papers 21-02, The University of Western Australia, Department of Economics.
    12. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    13. Husson, François & Josse, Julie & Saporta, Gilbert, 2016. "Jan de Leeuw and the French School of Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 73(i06).
    14. Darouich, Hanaa & Karfoul, Razan & Ramos, Tiago B. & Moustafa, Ali & Shaheen, Baraa & Pereira, Luis S., 2021. "Crop water requirements and crop coefficients for jute mallow (Corchorus olitorius L.) using the SIMDualKc model and assessing irrigation strategies for the Syrian Akkar region," Agricultural Water Management, Elsevier, vol. 255(C).
    15. de Leeuw, Jan, 2006. "Principal component analysis of binary data by iterated singular value decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 21-39, January.
    16. Escarabajal-Henarejos, D. & Fernández-Pacheco, D.G. & Molina-Martínez, J.M. & Martínez-Molina, L. & Ruiz-Canales, A., 2015. "Selection of device to determine temperature gradients for estimating evapotranspiration using energy balance method," Agricultural Water Management, Elsevier, vol. 151(C), pages 136-147.
    17. Gao, Yang & Yang, Linlin & Shen, Xiaojun & Li, Xinqiang & Sun, Jingsheng & Duan, Aiwang & Wu, Laosheng, 2014. "Winter wheat with subsurface drip irrigation (SDI): Crop coefficients, water-use estimates, and effects of SDI on grain yield and water use efficiency," Agricultural Water Management, Elsevier, vol. 146(C), pages 1-10.
    18. Yang, Yanmin & Yang, Yonghui & Han, Shumin & Li, Huilong & Wang, Lu & Ma, Qingtao & Ma, Lexin & Wang, Linna & Hou, Zhenjun & Chen, Li & Liu, De Li, 2023. "Comparison of water-saving potential of fallow and crop change with high water-use winter-wheat – summer-maize rotation," Agricultural Water Management, Elsevier, vol. 289(C).
    19. Jovanovic, N. & Pereira, L.S. & Paredes, P. & Pôças, I. & Cantore, V. & Todorovic, M., 2020. "A review of strategies, methods and technologies to reduce non-beneficial consumptive water use on farms considering the FAO56 methods," Agricultural Water Management, Elsevier, vol. 239(C).
    20. Fuentes, Sigfredo & Ortega-Farías, Samuel & Carrasco-Benavides, Marcos & Tongson, Eden & Gonzalez Viejo, Claudia, 2024. "Actual evapotranspiration and energy balance estimation from vineyards using micro-meteorological data and machine learning modeling," Agricultural Water Management, Elsevier, vol. 297(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0315574. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.