Dataset-ARM-Paper
This study investigates traffic crash patterns in North Khorasan province, Iran, utilizing data from the Iran Traffic Police spanning 2014 to 2018. The analysis encompasses both rural and urban accidents across diverse road types. The dataset, sourced from Traffic Accident Record forms, includes details on crash location, road characteristics, environmental conditions, driver profiles, vehicle information, primary causes, human factors, and crash severity. Driver data was integrated with crash records using RapidMiner Studio, resulting in a comprehensive dataset of 19,229 crashes. The research considers urban/rural differences, road classifications, and other factors contributing to accidents, aiming to understand crash trends and severity across the province. A rigorous data cleaning process ensured data quality by removing attributes with high uniqueness (≥98%), minimal variability (≥98% identical values), excessive missing values (>10%), and those irrelevant to the study’s objectives. This process reduced the initial 38 attributes to a refined set of 20 for subsequent crash severity analysis. To improve the classification of crash severity levels, continuous variables were discretized using supervised entropy-based discretization. Finally, the SMOTE oversampling technique was applied to balance the crash severity levels (Fatal/injury versus property damage only), and the dataset was split into a training set (70%) and a test set (30%) using stratified sampling.