IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0315574.html
   My bibliography  Save this article

Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size

Author

Listed:
  • Valter Cesar de Souza
  • Sergio Augusto Rodrigues
  • Luís Roberto Almeida Gabriel Filho

Abstract

Meteorological data acquired with precision, quality, and reliability are crucial in various agronomy fields, especially in studies related to reference evapotranspiration (ETo). ETo plays a fundamental role in the hydrological cycle, irrigation system planning and management, water demand modeling, water stress monitoring, water balance estimation, as well as in hydrological and environmental studies. However, temporal records often encounter issues such as missing measurements. The aim of this study was to evaluate the performance of alternative multivariate procedures for principal component analysis (PCA), using the Nonlinear Iterative Partial Least Squares (NIPALS) and Expectation-Maximization (EM) algorithms, for imputing missing data in time series of meteorological variables. This was carried out on high-dimensional and reduced-sample databases, covering different percentages of missing data. The databases, collected between 2011 and 2021, originated from 45 automatic weather stations in the São Paulo region, Brazil. They were used to create a daily time series of ETo. Five scenarios of missing data (10%, 20%, 30%, 40%, 50%) were simulated, in which datasets were randomly withdrawn from the ETo base. Subsequently, imputation was performed using the NIPALS-PCA, EM-PCA, and simple mean imputation (IM) procedures. This cycle was repeated 100 times, and average performance indicators were calculated. Statistical performance evaluation utilized the following indicators: correlation coefficient (r), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE), Normalized Root Mean Square Error (nRMSE), Willmott Index (d), and performance index (c). In the scenario with 10% missing data, NIPALS-PCA achieved the lowest MAPE (15.4%), followed by EM-PCA (17.0%), while IM recorded a MAPE of 24.7%. In the scenario with 50% missing data, there was a performance reversal, with EM-PCA showing the lowest MAPE (19.1%), followed by NIPALS-PCA (19.9%). The NIPALS-PCA and EM-PCA approaches demonstrated good results in imputation (10% ≤ nRMSE

Suggested Citation

  • Valter Cesar de Souza & Sergio Augusto Rodrigues & Luís Roberto Almeida Gabriel Filho, 2024. "Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-20, December.
  • Handle: RePEc:plo:pone00:0315574
    DOI: 10.1371/journal.pone.0315574
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0315574
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0315574&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0315574?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Henk Kiers, 1997. "Weighted least squares fitting using ordinary least squares algorithms," Psychometrika, Springer;The Psychometric Society, vol. 62(2), pages 251-266, June.
    2. Sentelhas, Paulo C. & Gillespie, Terry J. & Santos, Eduardo A., 2010. "Evaluation of FAO Penman-Monteith and alternative methods for estimating reference evapotranspiration with missing data in Southern Ontario, Canada," Agricultural Water Management, Elsevier, vol. 97(5), pages 635-644, May.
    3. Julie Josse & Jérôme Pagès & François Husson, 2011. "Multiple imputation in principal component analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(3), pages 231-246, October.
    4. Allen, Richard G. & Pereira, Luis S. & Howell, Terry A. & Jensen, Marvin E., 2011. "Evapotranspiration information reporting: I. Factors governing measurement accuracy," Agricultural Water Management, Elsevier, vol. 98(6), pages 899-920, April.
    5. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    6. Omidreza Mikaeili & Mojtaba Shourian, 2024. "Improving Evapotranspiration Estimation in SWAT-Based Hydrologic Simulation through Data Assimilation in the SEBAL Algorithm," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 38(11), pages 4101-4122, September.
    7. Mhawej, Mario & Caiserman, Arnaud & Nasrallah, Ali & Dawi, Ali & Bachour, Roula & Faour, Ghaleb, 2020. "Automated evapotranspiration retrieval model with missing soil-related datasets: The proposal of SEBALI," Agricultural Water Management, Elsevier, vol. 229(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shih-Lun Fang & Yi-Shan Lin & Sheng-Chih Chang & Yi-Lung Chang & Bing-Yun Tsai & Bo-Jein Kuo, 2024. "Using Artificial Intelligence Algorithms to Estimate and Short-Term Forecast the Daily Reference Evapotranspiration with Limited Meteorological Variables," Agriculture, MDPI, vol. 14(4), pages 1-20, March.
    2. Martí, Pau & López-Urrea, Ramón & Mancha, Luis A. & González-Altozano, Pablo & Román, Armand, 2024. "Seasonal assessment of the grass reference evapotranspiration estimation from limited inputs using different calibrating time windows and lysimeter benchmarks," Agricultural Water Management, Elsevier, vol. 300(C).
    3. Paredes, P. & Pereira, L.S. & Almorox, J. & Darouich, H., 2020. "Reference grass evapotranspiration with reduced data sets: Parameterization of the FAO Penman-Monteith temperature approach and the Hargeaves-Samani equation using local climatic variables," Agricultural Water Management, Elsevier, vol. 240(C).
    4. Mekonnen, Yilkal Gebeyehu & Alamirew, Tena & Malede, Demelash Ademe & Pareeth, Sajid & Bantider, Amare & Chukalla, Abebe Demissie, 2024. "Tailoring the surface energy balance algorithm for land-improved (SEBALI) model using high-resolution land/use land cover for monitoring actual evapotranspiration," Agricultural Water Management, Elsevier, vol. 303(C).
    5. Traore, Seydou & Luo, Yufeng & Fipps, Guy, 2016. "Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages," Agricultural Water Management, Elsevier, vol. 163(C), pages 363-379.
    6. Paredes, Paula & Martins, Diogo S. & Pereira, Luis Santos & Cadima, Jorge & Pires, Carlos, 2018. "Accuracy of daily estimation of grass reference evapotranspiration using ERA-Interim reanalysis products with assessment of alternative bias correction schemes," Agricultural Water Management, Elsevier, vol. 210(C), pages 340-353.
    7. Mhawej, Mario & Elias, Georgie & Nasrallah, Ali & Faour, Ghaleb, 2020. "Dynamic calibration for better SEBALI ET estimations: Validations and recommendations," Agricultural Water Management, Elsevier, vol. 230(C).
    8. Josse, Julie & Husson, François, 2012. "Selecting the number of components in principal component analysis using cross-validation approximations," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1869-1879.
    9. Qing Li & Long Hai Vo, 2021. "Intangible Capital and Innovation: An Empirical Analysis of Vietnamese Enterprises," Economics Discussion / Working Papers 21-02, The University of Western Australia, Department of Economics.
    10. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    11. Darouich, Hanaa & Karfoul, Razan & Ramos, Tiago B. & Moustafa, Ali & Shaheen, Baraa & Pereira, Luis S., 2021. "Crop water requirements and crop coefficients for jute mallow (Corchorus olitorius L.) using the SIMDualKc model and assessing irrigation strategies for the Syrian Akkar region," Agricultural Water Management, Elsevier, vol. 255(C).
    12. Escarabajal-Henarejos, D. & Fernández-Pacheco, D.G. & Molina-Martínez, J.M. & Martínez-Molina, L. & Ruiz-Canales, A., 2015. "Selection of device to determine temperature gradients for estimating evapotranspiration using energy balance method," Agricultural Water Management, Elsevier, vol. 151(C), pages 136-147.
    13. Gao, Yang & Yang, Linlin & Shen, Xiaojun & Li, Xinqiang & Sun, Jingsheng & Duan, Aiwang & Wu, Laosheng, 2014. "Winter wheat with subsurface drip irrigation (SDI): Crop coefficients, water-use estimates, and effects of SDI on grain yield and water use efficiency," Agricultural Water Management, Elsevier, vol. 146(C), pages 1-10.
    14. Fuentes, Sigfredo & Ortega-Farías, Samuel & Carrasco-Benavides, Marcos & Tongson, Eden & Gonzalez Viejo, Claudia, 2024. "Actual evapotranspiration and energy balance estimation from vineyards using micro-meteorological data and machine learning modeling," Agricultural Water Management, Elsevier, vol. 297(C).
    15. Shahadha, Saadi Sattar & Wendroth, Ole & Zhu, Junfeng & Walton, Jason, 2019. "Can measured soil hydraulic properties simulate field water dynamics and crop production?," Agricultural Water Management, Elsevier, vol. 223(C), pages 1-1.
    16. Escarabajal-Henarejos, D. & Molina-Martínez, J.M. & Fernández-Pacheco, D.G. & Cavas-Martínez, F. & García-Mateos, G., 2015. "Digital photography applied to irrigation management of Little Gem lettuce," Agricultural Water Management, Elsevier, vol. 151(C), pages 148-157.
    17. Zhao, Nana & Liu, Yu & Cai, Jiabing & Paredes, Paula & Rosa, Ricardo D. & Pereira, Luis S., 2013. "Dual crop coefficient modelling applied to the winter wheat–summer maize crop sequence in North China Plain: Basal crop coefficients and soil evaporation component," Agricultural Water Management, Elsevier, vol. 117(C), pages 93-105.
    18. Hu, Xinyu & Zhao, Jinfeng & Sun, Shikun & Jia, Chengru & Zhang, Fuyao & Ma, Yizhe & Wang, Kaixuan & Wang, Yubao, 2023. "Evaluation of the temporal reconstruction methods for MODIS-based continuous daily actual evapotranspiration estimation," Agricultural Water Management, Elsevier, vol. 275(C).
    19. Choudhury, B.U. & Singh, Anil Kumar & Pradhan, S., 2013. "Estimation of crop coefficients of dry-seeded irrigated rice–wheat rotation on raised beds by field water balance method in the Indo-Gangetic plains, India," Agricultural Water Management, Elsevier, vol. 123(C), pages 20-31.
    20. Elfarkh, Jamal & Simonneaux, Vincent & Jarlan, Lionel & Ezzahar, Jamal & Boulet, Gilles & Chakir, Adnane & Er-Raki, Salah, 2022. "Evapotranspiration estimates in a traditional irrigated area in semi-arid Mediterranean. Comparison of four remote sensing-based models," Agricultural Water Management, Elsevier, vol. 270(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0315574. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.