IDEAS home Printed from https://ideas.repec.org/a/gam/jagris/v11y2021i8p727-d605099.html
   My bibliography  Save this article

A Comparative Study of Various Methods for Handling Missing Data in UNSODA

Author

Listed:
  • Yingpeng Fu

    (School of Human Settlements and Civil Engineering, Xi’an Jiaotong University, Xi’an 710049, China)

  • Hongjian Liao

    (School of Human Settlements and Civil Engineering, Xi’an Jiaotong University, Xi’an 710049, China)

  • Longlong Lv

    (School of Human Settlements and Civil Engineering, Xi’an Jiaotong University, Xi’an 710049, China)

Abstract

UNSODA, a free international soil database, is very popular and has been used in many fields. However, missing soil property data have limited the utility of this dataset, especially for data-driven models. Here, three machine learning-based methods, i.e., random forest (RF) regression, support vector (SVR) regression, and artificial neural network (ANN) regression, and two statistics-based methods, i.e., mean and multiple imputation (MI), were used to impute the missing soil property data, including pH, saturated hydraulic conductivity (SHC), organic matter content (OMC), porosity (PO), and particle density (PD). The missing upper depths (DU) and lower depths (DL) for the sampling locations were also imputed. Before imputing the missing values in UNSODA, a missing value simulation was performed and evaluated quantitatively. Next, nonparametric tests and multiple linear regression were performed to qualitatively evaluate the reliability of these five imputation methods. Results showed that RMSEs and MAEs of all features fluctuated within acceptable ranges. RF imputation and MI presented the lowest RMSEs and MAEs; both methods are good at explaining the variability of data. The standard error, coefficient of variance, and standard deviation decreased significantly after imputation, and there were no significant differences before and after imputation. Together, DU, pH, SHC, OMC, PO, and PD explained 91.0%, 63.9%, 88.5%, 59.4%, and 90.2% of the variation in BD using RF, SVR, ANN, mean, and MI, respectively; and this value was 99.8% when missing values were discarded. This study suggests that the RF and MI methods may be better for imputing the missing data in UNSODA.

Suggested Citation

  • Yingpeng Fu & Hongjian Liao & Longlong Lv, 2021. "A Comparative Study of Various Methods for Handling Missing Data in UNSODA," Agriculture, MDPI, vol. 11(8), pages 1-28, July.
  • Handle: RePEc:gam:jagris:v:11:y:2021:i:8:p:727-:d:605099
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2077-0472/11/8/727/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2077-0472/11/8/727/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Xiaobo Yan & Weiqing Xiong & Liang Hu & Feng Wang & Kuo Zhao, 2015. "Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things," Mathematical Problems in Engineering, Hindawi, vol. 2015, pages 1-8, March.
    2. Sofia Karapouloutidou & Dionisios Gasparatos, 2019. "Effects of Biostimulant and Organic Amendment on Soil Properties and Nutrient Status of Lactuca Sativa in a Calcareous Saline-Sodic Soil," Agriculture, MDPI, vol. 9(8), pages 1-14, July.
    3. Roman Salmerón Gómez & José García Pérez & María Del Mar López Martín & Catalina García García, 2016. "Collinearity diagnostic applied in ridge estimation through the variance inflation factor," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(10), pages 1831-1849, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maira Abrar & Sohail Abbas & Shazia Kousar & Muhammad Mushtaq, 2023. "Investigation on the Effects of Customer Knowledge, Political Support, and Innovation on the Growth of Islamic Banking System: a Case Study of Pakistan," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 14(2), pages 946-965, June.
    2. Jinsong Yu & Baohua Mo & Diyin Tang & Jie Yang & Jiuqing Wan & Jingjing Liu, 2017. "Indirect State-of-Health Estimation for Lithium-Ion Batteries under Randomized Use," Energies, MDPI, vol. 10(12), pages 1-19, December.
    3. Shulan Hsieh & Zai-Fu Yao & Meng-Heng Yang, 2021. "Multimodal Imaging Analysis Reveals Frontal-Associated Networks in Relation to Individual Resilience Strength," IJERPH, MDPI, vol. 18(3), pages 1-18, January.
    4. Khishigsuren Davagdorj & Van Huy Pham & Nipon Theera-Umpon & Keun Ho Ryu, 2020. "XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction," IJERPH, MDPI, vol. 17(18), pages 1-22, September.
    5. Faruk Bhuiyan & Tarek Rana & Kevin Baird & Rahat Munir, 2023. "Strategic outcome of competitive advantage from corporate sustainability practices: Institutional theory perspective from an emerging economy," Business Strategy and the Environment, Wiley Blackwell, vol. 32(7), pages 4217-4243, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jagris:v:11:y:2021:i:8:p:727-:d:605099. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.