figshare
Browse

Dataset-ARM-Paper

dataset
posted on 2025-04-09, 15:19 authored by Seyed Iman MohammadpourSeyed Iman Mohammadpour

This study investigates traffic crash patterns in North Khorasan province, Iran, utilizing data from the Iran Traffic Police spanning 2014 to 2018. The analysis encompasses both rural and urban accidents across diverse road types. The dataset, sourced from Traffic Accident Record forms, includes details on crash location, road characteristics, environmental conditions, driver profiles, vehicle information, primary causes, human factors, and crash severity. Driver data was integrated with crash records using RapidMiner Studio, resulting in a comprehensive dataset of 19,229 crashes. The research considers urban/rural differences, road classifications, and other factors contributing to accidents, aiming to understand crash trends and severity across the province. A rigorous data cleaning process ensured data quality by removing attributes with high uniqueness (≥98%), minimal variability (≥98% identical values), excessive missing values (>10%), and those irrelevant to the study’s objectives. This process reduced the initial 38 attributes to a refined set of 20 for subsequent crash severity analysis. To improve the classification of crash severity levels, continuous variables were discretized using supervised entropy-based discretization. Finally, the SMOTE oversampling technique was applied to balance the crash severity levels (Fatal/injury versus property damage only), and the dataset was split into a training set (70%) and a test set (30%) using stratified sampling.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC