Prediction of vehicle crashes by drivers' characteristics and past traffic violations in Korea using a zero-inflated negative binomial model

ABSTRACT Aims: Traffic safety is a significant public health challenge, and vehicle crashes account for the majority of injuries. This study aims to identify whether drivers' characteristics and past traffic violations may predict vehicle crashes in Korea. Methods: A total of 500,000 drivers were randomly selected from the 11.6 million driver records of the Ministry of Land, Transport and Maritime Affairs in Korea. Records of traffic crashes were obtained from the archives of the Korea Insurance Development Institute. After matching the past violation history for the period 2004–2005 with the number of crashes in year 2006, a total of 488,139 observations were used for the analysis. Zero-inflated negative binomial model was used to determine the incident risk ratio (IRR) of vehicle crashes by past violations of individual drivers. The included covariates were driver's age, gender, district of residence, vehicle choice, and driving experience. Results: Drivers violating (1) a hit-and-run or drunk driving regulation at least once and (2) a signal, central line, or speed regulation more than once had a higher risk of a vehicle crash with respective IRRs of 1.06 and 1.15. Furthermore, female gender, a younger age, fewer years of driving experience, and middle-sized vehicles were all significantly associated with a higher likelihood of vehicle crashes. Conclusions: Drivers' demographic characteristics and past traffic violations could predict vehicle crashes in Korea. Greater resources should be assigned to the provision of traffic safety education programs for the high-risk driver groups.


Introduction
Traffic accidents are a significant public health issue (Gopalakrishnan 2012). According to the World Health Organization, traffic accidents result in more than 1 million deaths annually and are the second leading cause of death in the 15-29 year age group (World Health Organization 2015), making this group a highrisk population. Parents, however, may be a protective influence for this population, with a study finding that stricter parental limits were associated with fewer traffic violations among young drivers (Simons-Morton et al. 2006). In the United States, a zerotolerance policy has been adopted by many states, including suspension of driving licenses if a violation has occurred within the first 6 to 12 months of issuing a new license (Martin 1996).
In Korea, well-developed transportation infrastructures provide a comfortable and safe driving environment, and the minimum legal age for driving is 18. However, the number of vehicles on the road has increased by 32.2% during the last decade (from 2002 to 2012) and currently every 2.95 people own a car (Ministry of Land, Infrastructure and Transport 2012). Along with the increased number of vehicles, the number of road traffic crashes and related deaths remains high. In Korea, the road fatalities were 2.6 per 10,000 registered vehicles in 2010, ranking it within the top five of the 32 listed countries in the International Traffic Safety Data and Analysis Group (IRTAD) annual report (International Traffic Safety Data and Analysis Group 2011). Increasing health expenditure may help reduce road traffic fatalities (Castillo-Manzano et al. 2014), though traffic safety policies and education are more effective in preventing traffic accidents. Reckless driving accounts for 64% of the vehicle crashes in Korea (Yang and Kim 2003); thus, data on vehicle crashes could provide empirical evidence for formulating better education programs and traffic policies to minimize traffic injuries. This research is warranted because most of the existing studies to date have only used survey data (Alver et al. 2014;Vardaki and Yannis 2013) or police records (Rosenbloom and Eldror 2013). Moreover, the Dula Dangerous Driving Index, often used in studies, is a subjective measure of drivers' own perceived risk of dangerous driving and thus may not be a reliable indicator.
Major contributing factors to traffic violations can include human factors, such as drivers' demographic characteristics, driving experience, and use of seat belts, and nonhuman factors, such as type and maintenance of vehicles and safety measures (i.e., penalties or incentives). More important, the driving history of traffic violators may be another human factor to consider in the occurrence of vehicle crash incidents. In Korea, higher insurance premiums are imposed on drivers who violate traffic regulations. For instance, those who have violated a hitand-run or drink driving regulation within the last 2 years have to pay a 10% higher insurance premium than the previous year. Those drivers who have violated a signal, central line, or speed regulation more than once during the last 2 years have to pay a 5% higher insurance premium than nonviolators (Ki and Kim 2009). As a result, an examination of traffic violation records for the past 2 years could help identify those drivers at higher risk for a vehicle crash and justify the higher insurance premium they are liable to pay in Korea. We hypothesize that past traffic violators, even with an imposed higher insurance premium, may still be at a high risk for vehicle crashes. In other words, the traffic violators in the present data set are those who paid a higher premium by 10 or 5% than nonviolators, given other factors being equal. If the violators were more likely to be involved in a crash than nonviolators, a higher insurance premium or fine against the traffic violators is imposed, to provide a greater incentive for the traffic violators to comply with traffic regulations and maintain a good driving record. This study aims to investigate whether traffic crashes could be predicted by drivers' demographic characteristics, driving experience, and past violations.

Methods
To examine the crash risk of traffic violators, we obtained data on registered cars and drivers in Korea from the Korea Insurance Development Institute (KIDI), which is a nonprofit corporation established to protect the interest of policyholders and contribute to the development of the insurance industry. According to the Insurance Business Act, KIDI is the official organization in Korea to archive reference data for insurance products. In order to be eligible to drive, all drivers must have an insurance plan or policy covering at least bodily injury and property. KIDI collects nationwide information on the registered cars and drivers. In Korea, drivers who have violated traffic rules within the last 2 years have to pay a higher insurance premium. KIDI also maintains the traffic violation history of drivers for this purpose. The major dependent variable used in this study is the number of crashes during the year 2006, from the receipt of coverage by drivers for bodily injury and property. The latter insurance is mandatory in Korea. Items considered voluntary insurance are excluded. The data sets also include drivers' demographic information (sex, age, and district of residence), driving experience, car types in the year 2006, as well as traffic violations in the past 2 years (i.e., 2004 and 2005). Traffic violations are classified as Class 1 or Class 2. Class 1 violations include hit-and-run or drunk driving regulations and Class 2 violations include violations of signal, central line, or speed regulations. According to the Ministry of Land, Infrastructure and Transport (2012), there are about 16 million registered vehicles in Korea in the year 2006. Business vehicles in the registry were excluded, because private car owners better represent the general population and a different insurance premium applies to business vehicles. Ethical approval for this study was obtained from the KIDI. Five hundred thousand samples were then randomly selected from the 11.6 million private car drivers in the database. After further exclusion of observations with missing information, a total of 488,139 observations were included for the main analysis using STATA GLM (generalized linear model).
Regarding the independent variables, traffic violations are one of the major predictors of crashes. The variables for traffic violation are Class 1 violations (violation 1) including at least one hit-and-run or drunk driving violation within the last 2 years and Class 2 violations (violation 2) including more than one signal, central line, or speed regulation within the last 2 years. In addition to traffic violation history, involvement in traffic crashes (Class 1 or Class 2); variables such as age, gender, driving experience; auto type and size; and drivers' district of residence were included in the models. Because KIDI only collected variables that are related to insurance premiums, the data set does not contain drivers' incomes, which is an important socioeconomic factor and determinant of driving habits. To compensate for this limitation, auto type and size were used as a proxy for drivers' socioeconomic level. Moreover, drivers' district of residence may also contribute to driving habits, so we used place of registration to control for this variation.
Unlike the classical regression models, the distribution of the number of crashes was found to be skewed to the right, as shown in Table 1. Poisson regression is commonly used to model traffic crash analysis (B. J. Park and Lord 2009) and for an outcome Y , the probabilities of observing any specific count y are given as follows: where λ is the population rate parameter and y! = y × (y − 1) × . . . × 2 × 1. The mean and variance functions of the Poisson distribution are identical with E(Y ) = Var(Y ) = λ. One of the limitations of a Poisson model is that the standard errors are biased under overdispersion when the variance exceeds the mean. On the other hand, the negative binomial model is more appropriate under overdispersion conditions: where is the gamma function and α is called the dispersion parameter. The mean of the negative binomial distribution is the same as that of Poisson, but the variance is λ + αλ.
The estimated coefficient for the dispersion parameter α is 5.239 and the likelihood-ratio test suggests that α is significantly different from zero. Because the dispersion parameter is greater than zero, the assumption E(Y ) = Var(Y ) = λ of Poisson is violated and therefore a negative binomial model is more appropriate than a Poisson model for empirical analysis. However, count response models having far more zeros than expected by the distributional assumption of the negative binomial models result in biased parameter estimates as well as biased standard errors (Cameron and Trivedi 2005). A zero-inflated negative binomial (ZINB) model better accounts for the nonnegative count data with a large proportion of zeros and overdispersions The ZINB model is a combination between a generalized linear model for the dichotomous outcome that a count y is equal to zero such as logit and a negative binomial model (Long 1997). Although the distribution of the number of accidents strongly supports the excess zeros, the Vuong test for nonnested models is used to determine the existence of excess zeros as recommended by Desmarais and Harden (2013). The Vuong test rejected the null hypothesis of no excess zeros (P < .000) and, therefore, a zero-inflated negative binomial model is used for empirical analysis. For convenience of interpreting the regression results, we translated the estimated coefficients to incidence rate ratio (IRR) using the formula Figure 1 shows the distribution of the number of crashes in the sample. Table A1 (see online supplement) lists the variables used in the empirical analysis and their definitions. Table A2 (see online supplement) presents the means and standard deviation for the full sample; that is, for drivers who did and who did not violate traffic regulations during the period 2004 to 2005. Young male drivers had higher instances of traffic violations. In addition, a larger proportion of the traffic violators were drivers of larger cars or sport utility vehicles and those with driving experience of 2 to 4.9 years. That is, drivers with driving experience less than 2 years or more than 5 years had less traffic violations. Table A3 (see online supplement) shows the IRR from the ZINB regression, which is the estimated rate ratio for a one-unit increase in explanatory variables, given that the other variables are held constant in the model and the estimated coefficients are in the zero state. Most of variables in the part of the logit model predicting excessive zeros are statistically significant. The positive-valued coefficient indicates that the log odds of being an excessive zero would be increased by the variable and vice versa. For example, the higher the driver's age, the less likely a zero would be.

Results
According to the IRR, if a driver's age is constant, the accident rate for male drivers is expected to decrease by a factor 0.928. That is, female drivers are more likely to be involved in a crash than male drivers. That is, female drivers were more likely to be involved in a crash than male drivers. Each increase in age group is associated with an estimated 0.3% increase in risk of traffic crashes and the crash rates are higher among less experienced drivers, given that the other variables are held constant. For example, those with less than a year's driving experience had a 57.9% higher likelihood of being involved in an accident than those who had been driving for more than 5 years, given that drivers are of the same age. Regarding place of residence, drivers living in region 1 (central cities) are more likely to be involved in crashes than those from other regions. Moreover, drivers who had other types of vehicle are less likely to be involved in a crash than those who had a middle-sized vehicle. However, sport utility vehicle drivers are more likely to be involved in a crash than those with a middle-sized vehicle, with a IRR of 1.083.
Drivers who were involved in traffic violations during the years 2004 to 2005 were risky drivers in the following year (2006). Specifically, violation Class 1 (drunk drivers and hitand-run drivers) in 2004 or 2005 were 6.2% more likely to be involved in a crash than their counterparts in 2006. In addition, drivers who had violation Class 2 (signal, central line, or speed regulation) during the period 2004 to 2005 were 14.1% more likely to be involved in a vehicle crash in the following year than those drivers who did not violate or violated the traffic regulation only once.

Discussion
From the results, the higher accident rates of the drivers with violations in the past 2 years when compared to their counterparts may indicate that the insurance premium and fines alone should probably be accompanied with other measures such as law enforcement to further reduce vehicle crashes in Korea. As age increases, the crash rate also increases, showing a U-shaped relationship under the descriptive statistics (Figure 1). These age differences are consistent with previous results that show a high number of crashes for teenagers and older drivers (Massie and Campbell 1993). However, after adding the quadratic term (age squared) as an independent variable, we could not find a clear U-shaped relationship between age and crashes. This could be explained by the fact that the minimum legal driving age is 18 years in Korea, whereas it is 16 years in most developed countries such as the United States and Canada.

Sociodemographic and geographic differences
The observed results showed that female drivers had more traffic crashes than male drivers. Gender could be a moderator of emotional status and dangerous driving in young novice drivers (Scott-Parker et al. 2013). The gender effects on accident probability in prior studies from other countries are varied because of the inclusion of psychological factors, although in general it has been shown that male drivers take more risks than female drivers (Sarma et al. 2013). In terms of types of accidents, males are involved in more fatalities due to risk-taking behaviors, whereas women are more likely to be involved in accidents resulting from perceptual judgment errors (Waylen and McKenna 2002). In terms of driver characteristics, Anderson et al. (1999) identified male pickup truck drivers as having lower restraint use, displaying more risky driving behaviors, and being in receipt of more traffic citations, whereas Ore (1998) showed that female workers in transportation had higher accident rates than their male counterparts. In addition, U.S. data suggest that factors associated with injury on highways are gender specific. For male drivers, associated factors included travel on graded roadways and concrete or wet road surfaces. For female drivers, associated factors were possession of a valid license and weekend driving (Amarasingha and Dissanayake 2014). Further U.S. data also identified male drivers as being more likely to have headon crashes, whereas female drivers are more likely to have side crashes (Bingham and Ehsani 2012). We did not have information on the types of crashes in our study to undertake this level of analysis. The underlying reasons for the higher rate of crashes in female drivers in Korea is yet to be explored.
We also observed age differences, which is consistent with a previous report from the United States (Massie and Campbell 1993). In Spain, the rate of traffic crashes was higher for the age group 25-34 in 2006 and for the age group 15-24 (Kanaan et al. 2009). Secondary tasks such as texting also increased the chance of crashes for novice drivers (Klauer et al. 2014). Foreign experiences suggest that offering explicit financial incentives for speed limitations is effective in preventing speeding violations in young drivers (Bolderdijk et al. 2011). Moreover, our study revealed that the crash rates were higher among less experienced drivers. Consistent with this finding, Finn and Bragg (1986) demonstrated that experienced drivers detect potentially dangerous situations more quickly than beginners. In some countries, novice drivers are given a probation period after obtaining their licenses. Simulated tasks could help identify drivers at risk, especially older drivers (H. C. Lee and Lee 2005).
We also found regional differences in traffic crashes in this study. This was consistent with a previous study that reported the association between socioeconomic status of the geographic area and road traffic injury deaths (K. Park et al. 2010). Residents in deprived communities also have a higher likelihood of a fatality due to traffic accidents in Korea (J. . In England, pedestrians and cyclists still remain the major victims of traffic injuries, and the crashes are more likely to occur near homes (Steinbach et al. 2013). Advanced geographical information algorithms may help predict vehicle-pedestrian collisions based on the activities of the road users (Yao et al. 2014).
Drivers who had a middle-sized vehicle or sport utility vehicle were more likely to be involved in a crash when compared to those who had other types of vehicles, probably due to their engine power and agility of the car model. Perhaps high-occupancy vehicles should be analyzed separately from single-occupant vehicles in future analyses. Furthermore, Class 1 violators had a relatively lower IRR of involvement in crashes than Class 2 violators, probably due to the fact that hit-and-run drivers and drunk drivers are usually prohibited from driving and appear to cause fewer crashes than the Class 1 violators.
In Korea, traffic safety measures were implemented during the 2002 World Cup and included introduction of a road safety evaluation system with traffic monitoring cameras, enforcement of penalties for risky driving behaviors, and incentives for reporting traffic violations, as well as road safety education programs (Yang and Kim 2003). However, the benefits of the above policies may be transient with no evaluation of the long-term effects on saving lives. The total number of road traffic crashes in 2005 was similar to the records in 2000 (about 290,000). From a public health policy perspective, although a higher insurance premium is imposed on novice drivers and frequent violators of traffic rules and may be a good way to relieve the health care costs associated with crashes due to reckless driving, it still may not deter drivers from violating in the future.
There are several limitations to note when interpreting the results of this study. The use of auto sizes or types as a proxy of socioeconomic status may not be appropriate, because choices of vehicle could depend on family size, color or model preferences, as well as parking availability. Overestimating one's own driving skills (Eensoo et al. 2010;Zhang et al. 2013) and the presence of passengers (Weiss et al. 2014) are also major factors contributing to violations in novice drivers. Information about vehicle examinations or service history was not available. Moreover, we had no information about drink driving or driving under the influence of drugs (Li et al. 2013), and the severity of the violations was not accounted for in this analysis. Information on measures such as the number of traffic accidents with person injury may be a more useful dependent variable to account for those accidents directly related to human health and life. Ideally, details of insurance claims would be helpful for interpreting the severity of traffic violations. Moreover, harsh weather (W.K.  and alcohol abuse (Ju and Sohn 2014), which are further possible factors for road traffic accidents in Korea, were not examined in the models.
Drivers in violation of at least one hit-and-run or drunk driving regulation or more than one signal, central line, or speed regulation have a relatively higher risk of vehicle crashes than others in Korea even with currently higher insurance premiums. Furthermore, female gender, younger age, fewer years of driving experience, and type of vehicle are all significant factors associated with a higher risk of vehicle crashes. More investigations are warranted to investigate whether increased enforcement of traffic regulations and technological advances could reduce vehicle crashes in the future. In addition, traffic safety promotion in Korea should be targeting drivers of female gender, younger age, with less experience, driving middle-size vehicles, with frequent violations to traffic rules.

Funding
This work was supported by the Dong-A University Research Fund.