IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v18y2021i16p8375-d610320.html
   My bibliography  Save this article

Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation

Author

Listed:
  • Thelma Dede Baddoo

    (Binjiang College, Nanjing University of Information Science & Technology, No.333 Xishan Road, Wuxi 214105, China
    College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China)

  • Zhijia Li

    (College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China)

  • Samuel Nii Odai

    (Office of the Vice Chancellor, Accra Technical University, Accra GA000, Ghana)

  • Kenneth Rodolphe Chabi Boni

    (College of Computer and Information Engineering, Hohai University, Nanjing 211100, China)

  • Isaac Kwesi Nooni

    (Binjiang College, Nanjing University of Information Science & Technology, No.333 Xishan Road, Wuxi 214105, China
    Wuxi Institute of Technology, Nanjing University of Information Science & Technology, Wuxi 214105, China
    School of Geographical Sciences, Nanjing University of Information Science & Technology, Nanjing 210044, China)

  • Samuel Ato Andam-Akorful

    (Department of Geomatic Engineering, Kwame Nkrumah University of Science and Technology, Kumasi AK000, Ghana)

Abstract

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.

Suggested Citation

  • Thelma Dede Baddoo & Zhijia Li & Samuel Nii Odai & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni & Samuel Ato Andam-Akorful, 2021. "Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation," IJERPH, MDPI, vol. 18(16), pages 1-26, August.
  • Handle: RePEc:gam:jijerp:v:18:y:2021:i:16:p:8375-:d:610320
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/18/16/8375/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/18/16/8375/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Harvey,Andrew C., 1991. "Forecasting, Structural Time Series Models and the Kalman Filter," Cambridge Books, Cambridge University Press, number 9780521405737.
    3. Joseph L. Schafer, 2003. "Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 57(1), pages 19-35, February.
    4. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    5. Hapfelmeier, A. & Hothorn, T. & Ulm, K., 2012. "Recursive partitioning on incomplete data using surrogate decisions and multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1552-1565.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Akihiko Murayama & Daisuke Higuchi & Kosuke Saida & Shigeya Tanaka & Tomoyuki Shinohara, 2024. "Fall Risk Prediction for Community-Dwelling Older Adults: Analysis of Assessment Scale and Evaluation Items without Actual Measurement," IJERPH, MDPI, vol. 21(2), pages 1-11, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    2. Schoemaker, Nikita K. & Juffer, Femmie & Rippe, Ralph C.A. & Vermeer, Harriet J. & Stoltenborgh, Marije & Jagersma, Gabrine J. & Maras, Athanasios & Alink, Lenneke R.A., 2020. "Positive parenting in foster care: Testing the effectiveness of a video-feedback intervention program on foster parents’ behavior and attitudes," Children and Youth Services Review, Elsevier, vol. 110(C).
    3. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    4. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    5. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    6. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    7. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    8. A.Y. Kombo & H. Mwambi & G. Molenberghs, 2017. "Multiple imputation for ordinal longitudinal data with monotone missing data patterns," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(2), pages 270-287, January.
    9. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527, December.
    10. Jesús Rascón & Wildor Gosgot Angeles & Manuel Oliva-Cruz & Miguel Ángel Barrena Gurbillón, 2022. "Wind Characteristics and Wind Energy Potential in Andean Towns in Northern Peru between 2016 and 2020: A Case Study of the City of Chachapoyas," Sustainability, MDPI, vol. 14(10), pages 1-11, May.
    11. Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
    12. Christian Kurniawan & Xiyu Deng & Adhiraj Chakraborty & Assane Gueye & Niangjun Chen & Yorie Nakahira, 2022. "A Learning and Control Perspective for Microfinance," Papers 2207.12631, arXiv.org, revised Dec 2022.
    13. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2016. "Multiple Imputation of Multilevel Missing Data," SAGE Open, , vol. 6(4), pages 21582440166, October.
    14. Hapfelmeier, A. & Ulm, K., 2014. "Variable selection by Random Forests using data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 129-139.
    15. Taesung Kim & Jinhee Kim & Wonho Yang & Hunjoo Lee & Jaegul Choo, 2021. "Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks," IJERPH, MDPI, vol. 18(22), pages 1-8, November.
    16. Koji Uchiyama & Yasuo Haruyama & Hiromi Shiraishi & Kiyohiko Katahira & Daiki Abukawa & Takashi Ishige & Hitoshi Tajiri & Keiichi Uchida & Kan Uchiyama & Masakazu Washio & Erika Kobashi & Atsuko Maeka, 2020. "Association between Passive Smoking from the Mother and Pediatric Crohn’s Disease: A Japanese Multicenter Study," IJERPH, MDPI, vol. 17(8), pages 1-10, April.
    17. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    18. Avanzi, Benjamin & Taylor, Greg & Vu, Phuong Anh & Wong, Bernard, 2020. "A multivariate evolutionary generalised linear model framework with adaptive estimation for claims reserving," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 50-71.
    19. Prilly Oktoviany & Robert Knobloch & Ralf Korn, 2021. "A machine learning-based price state prediction model for agricultural commodities using external factors," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 44(2), pages 1063-1085, December.
    20. Abhilash Bandam & Eedris Busari & Chloi Syranidou & Jochen Linssen & Detlef Stolten, 2022. "Classification of Building Types in Germany: A Data-Driven Modeling Approach," Data, MDPI, vol. 7(4), pages 1-23, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:18:y:2021:i:16:p:8375-:d:610320. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.