Bibliometric analysis of simulated driving research from 1997 to 2016

Abstract Objective: The objective of this study was to explore the evolution footprints of simulated driving research in the past 20 years through rigorous and systematic bibliometric analysis, to provide insights regarding when and where the research was performed and by whom and how the mainstream content evolved over the years. Methods: The analysis began with data retrieval in Web of Science with defined search terms related to simulated driving. BibExcel and CiteSpace were employed to conduct the performance analysis and co-citation network analysis; that is, probe of the performance of institutes, journals, authors, and research hotspots. Results: A total of 3,766 documents were filtered out and presented an exponential growth from 1997 to 2016. The United States contributed the most publications as well as international collaborations followed by Germany and China. In addition, several universities in The Netherlands and the United States dominated the list of contributing institutes. The leading journals were in transportation and ergonomics. The leading researchers were also recognized among the 8,721 contributing authors, such as J. D. Lee, D. L. Fisher, J. H. Kim, and K. A. Brookhuis. Finally, the co-citation analysis illuminated the evolution of simulated driving research that covered the following topics roughly in chronological order: task-induced stress, drivers with neurological disorders, alertness and sleepiness while driving, trust toward driving assistance systems, driver distraction, the effect of drug use, the validity of simulators, and automated driving. Conclusions: This article employed bibliometric tools to probe the contributing countries, institutes, journals, authors, and mainstream hotspots of simulated driving research in the past 20 years. A systematic bibliometric analysis of this field will help researchers realize the panorama of global simulated driving and establish future research directions.


Introduction
Driving is one of the most common tasks people conduct every day. It is a comprehensive integration of sensory, perceptual, cognitive, and motor functions, all of which can be affected by numerous human-vehicle-environment factors (Anstey et al. 2005;Kircher et al. 2007). Therefore, approaches that are conducive to investigating the driving experience are needed to further ensure the safe operation of the human-vehicle-environment system. Simulated driving provides us with an alternative means of performing secure, replicable, quantitative, and controllable protocol whose applications have been extended from the transportation field to medicine, psychology, and cognitive fields (Ku et al. 2002;Mader et al. 2009;Wang et al. 2010).
Initially, driving simulators were developed to reduce the expensive cost of on-road tests, achieve more control over the vehicles' and drivers' performance, and perform driving tasks under risky situations. In the 1970s, there were at least 20 driving simulators throughout the United States and Europe mainly used for driver training and licensing (Fisher et al. 2011). Driving simulation has become an essential and common element in regional, national, and international efforts from the following perspectives.
1. Effects of distracteds driving such as adaptive cruise control, navigation aids, and cell phones (Liu and Ou 2011;Markvollrath et al. 2011;Strayer et al. 2003). Typically, Strayer et al. (2003) employed simulators to assess the effects of cellular phone conversations on both recognition memory and implicit perceptual memory and the eye-tracking data suggested that impairment of driving performance is mediated by reduced attention to visual inputs. 2. Effects of traffic scenarios such as different weather and road conditions and traffic control devices such as signs, signals, and markings on driver behaviors (Konstantopoulos et al. 2010;Lenn e et al. 2011). For example, Konstantopoulos et al. (2010) examined the function of driving experience (instructors and learners) and visibility (day, night, and rain) with a simulator and reported that experience dominated the modification of the visual search strategies and poor visibility played a secondary role. 3. Driving performance of patients with visual, cognitive, and motor impairment due to aging, brain injury and neurological disorders such as Parkinson's diseases and sleep apnea (George et al. 1996;Lew et al. 2005;Rizzo et al. 2001). 4. The impact of alcohol, caffeine, and meditations on driving (Åkerstedt et al. 2005;Vakulin et al. 2007). 5. Psychophysiological characteristics of drivers while driving such as mental workload and task-induced fatigue (Brookhuis and de Waard 2010;Lin et al. 2012;Zhao et al. 2012). Zhao et al. (2012) recorded electroencephalograms (EEGs) and electrocardiographs of 13 subjects during continuous simulated driving tasks and pointed out several sensitive indicators that reflect mental fatigue, such as EEG alpha and beta relative power, P300 wave of event-related potential, and heart rate variability (HRV). 6. Training effects of different groups, such as young, novice, and elderly drivers (de Winter et al. 2009;Pollatsek et al. 2006).
Despite the considerable benefits and broad applications that simulated driving has earned, there remain several contentious issues. Simulator adaption syndrome (or simulator sickness) and validity (or fidelity) of simulators are 2 of the most frequent concerns. Based on multiple theories of motion sickness, Brooks et al. (2010) presented a method with greater than 90% accuracy in identifying symptomatic individuals. The results proved that elderly participants were had a greater likelihood of simulator sickness and caused extensive discussions on research ethics. Yan et al. (2008) compared the speed and crash history parameters of simulator experiments and field tests and the speed data shared the same distributions and equal means in both situations but the crash data showed more risky behaviors in the simulator. Cross-platform comparisons among different simulators and standard virtual scenarios for training and tests are claimed to be necessary as well (Fisher et al. 2011).
Because simulated driving has attracted attention on a global scale, this field has accumulated numerous academic outputs including the papers cited above. These discrete research works provided straightforward and in-depth investigations on specific areas that depended on the interest and expertise of the authors. Therefore, a systematic and comprehensive analysis that can cover as much research as possible is needed to integrate these individual studies and present an overview of how this field has been evolving over the past decades. Fortunately, bibliometrics, defined as "the application of mathematical and statistical methods to books and other media of communication" (Pritchard 1969), provides us with a quantitative, objective, and practical approach to carry out evaluative, evolutive, and predictive analysis based on the substantial additional information of academic achievements (de Bellis 2009). Bibliometric analysis provides insights on research progress through two methods-performance analysis and co-occurrence network analysis (Noyons et al. 1999;Van Raan 2004). The former conducts statistical analysis based on citation information to recognize the leading papers, authors, institutes, and journals. The latter explores the knowledge structure and research hotspots via co-occurrence network of words, authors, citations, etc. The 2 complementary methods are capable of presenting a comprehensive and accurate analysis of a given topic in conjunction.
In consideration of the rich achievements of simulated driving research and the superiority of bibliometric methods, this article attempts to visually present a panorama of the simulated driving domain and thoroughly probe the research status and progress in this field with bibliometric methods. Specifically, this study performs a rigorous and bibliometric analysis of the scientific publications related to simulated driving in the past 20 years (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016); for instance, what type of research is conducted, who is doing the research, where is the research taking place, and what the research tendency is over time. The findings can help relevant researchers realize the knowledge structure of global simulated driving research and establish or alter further research directions.

Data collection
This article aims to conduct a quantitative and visualized analysis of the representative works related to simulated driving through bibliometric methods. Web of Science (WoS) is an online subscription-based scientific citation indexing service that provides a comprehensive citation search. It gives accesses to multiple databases and encompasses over 148 million records (journals, books, and proceedings) that date back to the beginning of the 20th century. Therefore, WoS was selected as the source database to obtain the initial data. To obtain the maximum number of relevant documents related to simulated driving, we collected and stored data for the defined search terms using "Topic" search in Web of Science Core Collection. The field tag "Topic" in WoS means that a record will be selected if the search terms are included in title, abstract, author keywords, or keywords plus within the record. The keywords used for the initial data collection included possible combinations of "drive," "simulation," and their variant forms, such as "driving simulation," "drive simulation," "driving simulator$," "driver simulator$," and "simulated driving." Several wildcards and search operators were used to improve the accuracy of the retrieval results. The time span was set from 1997 to 2016, and the language was restricted to English.
With the retrieval strategies above, the initial search ended up with a total of 3,997 records including several document types, such as article, proceedings paper, book chapter, meeting abstract, and editorial material. In this article, "article" and "proceeding paper" were selected as the source data to conduct the subsequent analysis considering that they had been strictly peer reviewed and expert judged.
To cover as much additional information, we exported every retrieval item with "full records and cited references" in "txt" format from WoS. Microsoft Excel 2016 was employed to check and remove the reduplicated and unrelated items manually. With the retrieval and filtering strategy above, we finally obtained 3,766 documents to be analyzed.

Bibliometric analysis
Bibliometrics is a multifaceted endeavor covering structural, dynamic, evaluative, and predictive scientometrics. Although it was initially developed in the library and information science field, it has been widely accepted and adopted in many other fields, especially in quantitative assessment of the impact of researchers, institutes, and journals based on academic outputs (Mao et al. 2015).
Technologically, a variety of software tools and packages have been developed to perform the bibliometric process, such as BibExcel, CiteSpace, Pajek, MS Excel, Gephi, Histcite, Sci2, etc. Among these tools, BibExcel is a versatile bibliometric toolbox developed by Olle Persson. BibExcel is flexible and robust enough to manage the initial data in various formats and can conduct almost all the bibliometric analysis (Persson et al. 2009). However, CiteSpace provides visualized and interactive functions to facilitate the understanding and interpretation of network patterns and historical patterns, including identifying the fast-growth topical areas, finding citation hotspots in the land of publications, decomposing a network into clusters, and automatic labeling of clusters with terms from citing articles (Chen 2004). In this article, we mainly employed BibExcel and CiteSpace to perform the bibliometric analysis; Pajek and Excel were also used when necessary.
As for the research content, bibliometrics involves statistical analysis of scientific outputs, namely, performance analysis, which provides quantitative indicators to overcome the shortcomings of subjectivity in peer review and expert judgments (Van Raan 2004). This method has been applied to evaluate research performance in an increasing number and variety of studies (Chai and Xiao 2012;Ruhanen et al. 2015;Zhuang et al. 2013). On the other hand, bibliometric analysis enables us to monitor and outline research contents and trends of a given topic, which might be of particular interest to scholars and policymakers. Multiple types of attempts have been advocated to delineate research contents, including historiographic mapping (Garfield 2004), document co-citation (Small 1973) or author co-citation (White and McCain 1998), and co-word analysis (Callon et al. 1991). In this article, co-citation network of references is mainly adopted to probe and outline the intellectual structure as well as the evolution footprints of simulated driving research in the given time window.

Performance of countries
To find the contributing countries, the stored text file was loaded into BibExcel, converted to ".doc" format to be recognized, and then sorted by year and country. Figure 1 presents the total number of publications (red solid line) and the most contributing countries (labeled dotted line) in the past 20 years. It is evident that the total number of published documents related to simulated driving has shown a trend of exponential growth, especially after 2003. We might infer that the rapid promotion of simulation technology and high popularity of vehicles led to the increased academic attention. However, the decline of the solid line in 2016 may be due to a lack of statistical data because that time delays do exist before many scientific outputs are available online.

Performance of journals
Analysis of the performance of journals can help us understand which journals have published the most publications and drawn the most attention and which journal to submit a finished manuscript to. Table 1 presents the top 10 fruitful journals in descending order of number of documents among all 1,516 journals and proceedings. The top 10 journals account for 22.78% of the total 3,766 articles. Although most of them belong to the transportation field, it should be noted that journals in the ergonomics or human factors field have published numerous documents and received relatively high total local citation score (TLCS) and total global citation score (TGCS), proving that simulated driving is an essential method to investigate human-vehicle-environment systems. There are other noteworthy journals that are not shown in the list.

Performance of institutes
Institute statistics can provide insight on where the research has been completed. In this article, institutes of the authors and their locations were extracted from the ".doc" file to calculate the number of documents from each institute. Table  2 lists the top 10 contributing institutes and their citations. In Table 2, several fruitful institutes in The Netherlands and the United States dominate the list with obvious superiority in numbers, despite the fact that the total number of institutes and authors in The Netherlands is less than that of other countries. From this perspective of citations, the University of Iowa and University of Massachusetts in the United States and Monash University in Australia account for the top 3 in both TLCS and TGCS, and these institutes have drawn much attention due to some foundational or pioneering studies regarding the high TLCS and TGCS. It is worth mentioning that there are a few remarkable institutes not listed in the table that have received high TLCS and TGCS with several documents. For example, the University of Utah in the United States acquired a good reputation (No.: 12, TLCS: 355, TGCS: 1,743) for early research on cell phone use-induced distraction.
Additionally, the exported locations of institutes were converted to geographic coordinates by Google geocoding in BibExcel to be recognized by a geographical visualization platform (www.gpsvisualizer.com). Figure 2 illustrates the geographical locations of all contributing institutes and their collaborations in 1997-2006 and 2007-2016 separately. Each green marker represents one institute, and the red line connecting 2 markers denotes that these 2 institutes have cooperated at least once. It is evident that eastern North America, Western Europe, Eastern Asia, and Australia have a higher density of contributing institutes and collaborations. Comparing the locations and collaborations of institutes in 1997-2006 with those in 2007-2016, it can be found that there has been a significant increase in the number of simulated driving research institutes and collaborations. In general, the geographic distribution of these institutes indicates that simulated driving has received a lot of attention globally. The academic cooperation between institutes has become a trend to make the best of devices, technology, and knowledge in different institutes.

Performance of authors
Author information was extracted from the ".doc" file and the frequency of occurrence of all authors was recorded. Table A1 (see online supplement) outlines the top 20 contributing authors out of the 8,721 authors and the number of documents they authored or co-authored. Table A2 (see online supplement) presents the collaborative relationships between authors in descending order of frequency. It appears that Lee, Fisher, Kim, and Brookhuis dominate the list of contributing authors. However, it should be noticed that Fisher has co-authored with Pollatsek (ranked 18th) and Pradhan 19 and 12 times; the same was notedfor Brookhuis and de Waard, Yan and Rong, and Lin and Jung. K. Brijs, T. Brijs, and G. Wets have co-authored nearly 10 times, and we may infer that these 3 authors are very likely to be in the same institute or research team; the same has been noted for Akerstedt, Kecklund, and Anund. Comparing Table A1 and Table A2, we may conclude that many fruitful authors have cooperated with others frequently and the trans-institute and trans-region collaboration has become a typical pattern to conduct scientific research.

Co-citation network analysis
Common in literature references, co-citation is defined as the frequency with which 2 documents are cited together by other documents (Small 1973). The more co-citations 2 documents receive, the more likely they are semantically related. Based on this assumption, Small and Griffith (1974) proposed the approach of exploring the content of a knowledge domain by cluster analysis of a document co-citation network. To present more visual features for the evolution footprints of simulated driving research, we further employed CiteSpace to conduct a document co-citation network analysis of the references cited by the retrieval data set. The 62,360 references cited by the 3,766 documents were loaded in CiteSpace (Ver 5.0 R7) to perform the co-citation analysis. The nodes of the network were selected under the criterion that the top 10% of the most cited or frequently occurring items were kept in each slice with the maximum number of selected items not more than 150 per slice.
The final co-citation network with 955 nodes and 2,414 edges was constructed with CiteSpace for further clustering. Citespace identifies clusters with internal spectral clustering algorithms based on eigenvectors of Laplacian matrices derived from the original network (Chen et al. 2010). The spectral clustering is proved to be more flexible, robust, and efficient compared to traditional algorithms such as k-means and single linkage in practice (Von Luxburg 2007). In addition, spectral clustering provides clearly defined information for subsequent automatic labeling and summarization to work with. Three cluster labeling algorithms based on noun phrases and index terms of citing articles of each cluster are tf Ã idf, log-likelihood ratio tests, and mutual information (Chen et al. 2010). According to the clustering and labeling algorithms, 14 major clusters with each cluster including more than 15 documents were maintained from the total 208 clusters to draw the main cluster view shown in Figure 3. Table 3 lists specific information on the 14 major clusters by their size, namely, the number of members in each cluster. Considering the fact that clusters with few members tend to be less representative than larger clusters, clusters with less than 15 members were filtered out to avoid redundancy and overlap of the visual graphics. In addition, the quality of a cluster can be evaluated with the "Silhouette" score, which reflects the homogeneity or consistency of the cluster. The silhouette values of the 14 major clusters in Table 3 tend to be close to 1, indicating that the clustering results are convincing. Additionally, we export a timeline view of the cluster results ( Figure A1, see online supplement), which allows more direct and clear understanding of the evolution footprints of simulated driving research over the years. From Figure 3, it can be seen that early studies around 1995 formed 4 major clusters: #3 simulated driving performance, #4 insulin-dependent diabetes mellitus, #12 vision enhancement, and #13 stress. Early studies related to simulated driving were relatively scattered and there were few co-citation relationships among clusters due to the imperfection of interdisciplinary theories and knowledge diffusion technology. Despite the weak correlation, these early clusters performed in-depth investigation in their respective domains and some of them contributed to subsequent research content. According to the documents in each cluster, we can conclude that in the early stage, researchers focused on simulated driving performance to investigate factors that induced automobile accidents. At the same time, physiological and pathological characteristics of drivers while driving attracted much attention, such as task-induced stress and fatigue. It is noteworthy that studies on drivers with metabolic and neurological disorders, such as diabetes mellitus and dementia, began to appear in this stage and continued to be a concern in the next decade. Technically, a vision enhancement system (#10) based on infrared imaging was integrated into vehicles to enable drivers to see clearly under poor illumination such as night and foggy driving. Not surprising, the emergence and application of assistive technology attracted academic attention on drivers' visual patterns and driving behaviors.
Several fields begin to draw research interest around 2000, such as alertness or sleepiness of drivers, handheld device-induced driver distraction, and drivers' trust and acceptance of automation. Though driving tasks require constant alertness, sleepiness caused by driver fatigue and sleep apnea has been commonly observed and reported to cause serious consequences. Some precision devices in psychology and medicine fields, such as EEG, and functional magnetic resonance imaging, were applied to detect the pathogenic mechanism and clinical symptoms on the platform of driving simulators. With improvement in the degree of automation, multiple driving assistance systems were developed to undertake part of the drivers' tasks. However, trust and acceptance of automation, such as shared control with machines, remained a contentious issue for fear of the disuse, misuse, and abuse of automation. In addition, driver distraction had received sustained academic attention since the beginning of the 21st century when cell phones began to prevail. Driver distraction has been closely linked with the use of cell phones and smart devices (texting, conversation, email, etc.) as well as the boom of in-vehicle information systems. In this stage, attempts to optimize the interaction process and warning systems were also advocated; therefore, more realistic simulators with haptic feedback began to receive attention from the research community.
Between 2000 and 2005, in addition to the continuous clusters such as driver distraction, alertness, and tactile, 4 major clusters related to simulated driving came into being: #9 methamphetamine, #7 novice driver, #6 2-lane rural road, and #2 validity. Methamphetamine was considered to be one of the most popular prohibited stimulants among drivers, along with alcohol (Silber et al. 2012). Their exact effects on driving performance were being taken seriously by governments and research communities at this stage.  Regarding the age distribution of drivers, vehicles have switched from luxuries to necessities and vehicle owners are no longer only commuters, and increasing numbers of the young and the old drivers have significantly influenced the age structure of drivers. However, evidence has established that age-related declines in cognitive, mental, and physical ability are associated with an increase in accident risk. The U-shaped relationships between fatal/injury involvement rate and age group indicate that the young group and the old group suffer from a relatively high possibility of injury and fatality in traffic accidents (Massie et al. 1995). Numerous studies were carried out to detect the driving behaviors of novice, young, and elderly drivers to ensure the safety of road users. The validity of driving simulators or the simulator's fidelity and behavioral validity was another research hotspot and was considered to be a key component of any study that utilizes simulators to evaluate driving performance (Shechtman et al. 2009). Researchers have performed a series of studies to assess the validity of driving simulators (Mayhew et al. 2011;Shechtman et al. 2009).
From 2005 to 2010, part of the major clusters attracted massive attention and #8 automated driving began to receive attention. Since cars were invented, humanity has dreamt of self-driving or driverless vehicles. Nowadays, this situation seems to be rapidly changing as the technology required for automated driving is starting to become available. This has resulted in design, testing, and validation of automated driving systems where simulated driving plays an important role. Since 2010, several clusters have received a lot of attention and tend to be the active fields at present, including driver distraction, the validity of driving simulators, automated driving, and drug effects.
Additionally, we export the top 20 references with the strongest citation bursts between 1997 and 2016 (see Figure  A2, online supplement). The academic dynamics of a given field can be characterized to a certain degree by member articles that received the steepest increase of citations, namely, citation bursts. A citation burst indicates the likelihood that the scientific community has paid or is paying particular attention to the underlying contribution (Chen 2004). Compared with the leading articles in Table 3, we may conclude that it is the academic attention to these articles with strong citations that led to the formation of the main representative clusters. Acquaintance with these documents with strong citation bursts will lead to a quick understanding of the formation of clusters and the evolution of the simulated driving domain.

Discussion/limitations
Based on the data set including 3,766 documents obtained from Web of Science Core Collection from 1997 to 2016, this article performed a rigorous and thorough bibliometric analysis of simulated driving research and some significant points regarding research performance were noted. The exponential model fitting illustrated that yearly publications increased greatly in the past 2 decades. It was notable that the United States contributed to the largest number of publications as well as international collaborations, followed by Germany and China. In addition, several universities in The Netherlands and the United States dominated the list of contributing institutes. There were a total of 1,516 journals and proceedings covering the 3,766 documents. It was interesting that 3 of the 10 most contributing journals belonged to the ergonomics realm and the others were closely linked to the transportation or traffic accidents. Additionally, the most fruitful authors and their partnerships were recognized. With the aid of CiteSpace, the primary research contents and core documents in different stages were confirmed. In earlier stages, simulated driving platforms tended to be used to investigate the driving performance of drivers under risky situations, task stress, and physiological or neurological disorders. In the last decade, with the development of technology, mainstream research has switched toward attempts to enhance and verify the validity of simulators, the study of mobile device-induced driver distraction, and studies on advanced automation. It can be concluded that simulated driving will continue to play an essential part in the future of the human-vehicle-environment system.
Practically, the performance analysis of the simulated driving research is significant to relevant researchers; for instance, selecting which researcher to follow and co-author with, determinig which journal to focus on and submit papers to, and deciding which institute or country to enhance cooperation and exchanges with. The co-citation network analysis presented us with clear evolution footprints  Matthews et al. (1990Matthews et al. ( , 1995 of simulated driving research systematically and comprehensively, which can help relevant researchers realize the panorama of global simulated driving and establish future research directions. Theoretically, the bibliometric analysis provides strong evidences that simulated driving is becoming increasingly accepted, validated, and adopted in a wider geographical region, knowledge fields, and time spans. Although this study strictly followed the process of bibliometric analysis, there are limitations. The main limitation is that in the process of co-citation network analysis, small clusters will be filtered out to spotlight the major clusters. Therefore, there is a possibility that some bursting and promising but small clusters cannot be easily identified, thus missing future hotspots from a methodological aspect. Methodologically, in addition to bibliometrics, several typical approaches are available to review documents, such as systematic review and meta-analysis. Those methods might provide different results. Further comparisons between different methods might spark materials that are complementary to the current findings and produce more persuasive guidance in the simulated driving domain.