IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0281901.html
   My bibliography  Save this article

Classification of truck-involved crash severity: Dealing with missing, imbalanced, and high dimensional safety data

Author

Listed:
  • Seyed Iman Mohammadpour
  • Majid Khedmati
  • Mohammad Javad Hassan Zada

Abstract

While the cost of road traffic fatalities in the U.S. surpasses $240 billion a year, the availability of high-resolution datasets allows meticulous investigation of the contributing factors to crash severity. In this paper, the dataset for Trucks Involved in Fatal Accidents in 2010 (TIFA 2010) is utilized to classify the truck-involved crash severity where there exist different issues including missing values, imbalanced classes, and high dimensionality. First, a decision tree-based algorithm, the Synthetic Minority Oversampling Technique (SMOTE), and the Random Forest (RF) feature importance approach are employed for missing value imputation, minority class oversampling, and dimensionality reduction, respectively. Afterward, a variety of classification algorithms, including RF, K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), Gradient-Boosted Decision Trees (GBDT), and Support Vector Machine (SVM) are developed to reveal the influence of the introduced data preprocessing framework on the output quality of ML classifiers. The results show that the GBDT model outperforms all the other competing algorithms for the non-preprocessed crash data based on the G-mean performance measure, but the RF makes the most accurate prediction for the treated dataset. This finding indicates that after the feature selection is conducted to alleviate the computational cost of the machine learning algorithms, bagging (bootstrap aggregating) of decision trees in RF leads to a better model rather than boosting them via GBDT. Besides, the adopted feature importance approach decreases the overall accuracy by only up to 5% in most of the estimated models. Moreover, the worst class recall value of the RF algorithm without prior oversampling is only 34.4% compared to the corresponding value of 90.3% in the up-sampled model which validates the proposed multi-step preprocessing scheme. This study also identifies the temporal and spatial (roadway) attributes, as well as crash characteristics, and Emergency Medical Service (EMS) as the most critical factors in truck crash severity.

Suggested Citation

  • Seyed Iman Mohammadpour & Majid Khedmati & Mohammad Javad Hassan Zada, 2023. "Classification of truck-involved crash severity: Dealing with missing, imbalanced, and high dimensional safety data," PLOS ONE, Public Library of Science, vol. 18(3), pages 1-22, March.
  • Handle: RePEc:plo:pone00:0281901
    DOI: 10.1371/journal.pone.0281901
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281901
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0281901&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0281901?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gholamreza Shiran & Reza Imaninasab & Razieh Khayamim, 2021. "Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artificial Neural Network: A Modeling Comparison," Sustainability, MDPI, vol. 13(10), pages 1-23, May.
    2. Shuguang Zhang & Afaq Khattak & Caroline Mongina Matara & Arshad Hussain & Asim Farooq, 2022. "Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple-vehicle accidents," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-19, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Afaq Khattak & Hamad Almujibah & Ahmed Elamary & Caroline Mongina Matara, 2022. "Interpretable Dynamic Ensemble Selection Approach for the Prediction of Road Traffic Injury Severity: A Case Study of Pakistan’s National Highway N-5," Sustainability, MDPI, vol. 14(19), pages 1-18, September.
    2. Alicja Wolny-Dominiak & Tomasz Żądło, 2021. "The Measures of Accuracy of Claim Frequency Credibility Predictor," Sustainability, MDPI, vol. 13(21), pages 1-13, October.
    3. Mubarak Alrumaidhi & Hesham A. Rakha, 2022. "Factors Affecting Crash Severity among Elderly Drivers: A Multilevel Ordinal Logistic Regression Approach," Sustainability, MDPI, vol. 14(18), pages 1-12, September.
    4. Debela Jima & Tibor Sipos, 2022. "The Impact of Road Geometric Formation on Traffic Crash and Its Severity Level," Sustainability, MDPI, vol. 14(14), pages 1-25, July.
    5. Miaomiao Yan & Yindong Shen, 2022. "Traffic Accident Severity Prediction Based on Random Forest," Sustainability, MDPI, vol. 14(3), pages 1-13, February.
    6. Guanghui Gao & Yining Guo & Lumei Zhou & Li Li & Gang Shi, 2024. "Res2Net-based multi-scale and multi-attention model for traffic scene image classification," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-26, May.
    7. Fu Wang & Jing Wang & Xianfeng Zhang & Dengjun Gu & Yang Yang & Hongbin Zhu, 2022. "Analysis of the Causes of Traffic Accidents and Identification of Accident-Prone Points in Long Downhill Tunnel of Mountain Expressways Based on Data Mining," Sustainability, MDPI, vol. 14(14), pages 1-22, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0281901. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.