Travel and us: the impact of mode share on sentiment using geo-social media and GIS

Abstract Commute stress is a serious health problem that impacts nearly everyone. Considering that microblogged geo-locational information offers new insight into human attitudes, the present research examined the utility of geo-social media data for understanding how different active and inactive travel modes affect feelings of pleasure, or displeasure, in two major US cities: Chicago, Illinois and Washington DC. A popular approach was used to derive a sentiment index (pleasure or valence) for each travel Tweet. Methodologically, exploratory spatial data analysis (ESDA) and global and spatial regression models were used to examine the geography of all travel modes and factors affecting their valence. After adjusting for spatial error associated with socioeconomic, environmental, weather and temporal factors, spatial autoregression models proved superior to the base global model. The results showed that water and pedestrian travel were universally associated with positive valences. Bicycling also favourably influenced valence, albeit only in DC. A noteworthy finding was the negative influence temperature and humidity had on valence. The outcomes from this research should be considered when additional evidence is needed to elevate commuter sentiment values in practice and policy, especially in regards to active transportation.


Introduction
Commuting stress, an ever-growing social problem, continues to be studied by researchers in public health, planning, social sciences, engineering, economics and business (Novaco and Gonzalez 2009). Daily commute mode choice decisions have resulted in a large portion of the human population being affected by stress (Novaco and Gonzalez 2009;Legrain, Eluru, and El-Geneidy 2015). Kahneman et al. (2004) found that commuting was among the lowest ranked factors contributing to net affect (i.e. experienced utility). This has consequences on mental and physical health, including: increased heart rate, blood pressure, back problems, as well as certain types of cancer (Hoehner et al. 2012). Commute stress also affects modechoice, especially sustainable travel modes (i.e. walking and bicycling) (Legrain, Eluru, and El-Geneidy 2015), which is concerning because these modalities have been linked to elevated satisfaction levels and improved public health outcomes (Furie and Desai 2012;Olafson 2014). Measuring the links between stress and transportation is a growing area of research due to past findings that a person's travel experience is a strong determinant of future behaviour and well-being (Eagly, Mladinic, and Otto 1994;Mokhtarian and Salomon 2001). Although a multitude of factors have been investigated, research gaps remain (De Vos et al. 2013).
Measuring trip utility and satisfaction is traditionally determined by summarising human responses to questionnaires, counters, interviews, focus groups, observations, interviews, photography, cognitive maps and hypothetical scenarios (Latham 2003;Berg and Lune 2004). These methods are flawed in large part because they do not measure a traveller's true experience, are impractical, untimely, costly, and biased (Neutens and Schwanen 2011;Collins, Hasan, and Ukkusuri 2013;Morris and Guerra 2015;Luo et al. 2016;Salon 2016). In addition, the spatial granularity of the data is coarse and typically aggregated to government units such as US Census Tracts, thereby minimising the impact of the results at a local scale (Sun et al. 2017). With the rise of sensor technology, new opportunities to study human-scaled mobility patterns and behaviour are available.
Information Communication Technology (ICT), which includes smart phones and the internet, has witnessed a spectacular increase in societal usage, in large part due to the attractiveness of social networking. Location-aware devices are able to collect semantic information with geo-locational attributes. A popular type of ICT that has attracted researchers interested in measuring travel behaviour is geo-social media (GSM) (Andrienko et al. 2013). The proliferation is marked by the ubiquitous penetration of mobile devices and accessibility of shared content, which has resulted in new insights and methods for studying the link between well-being and travel. The main advantages of GSM data include its humanscaled resolution, low cost, precise time-stamp and comments which allows for individualised insight into human-environmental behaviour. The current research attempts to investigate GSM data using GIS and contribute to the scant amount of work on spatially exploring human-locational data obtained from ICT and provide nuanced insight into the interrelations among travel mode, sentiment, socioeconomic status (SES), time, and environment (Lichman and Smyth 2014;Luo et al. 2016). The results of this study will assist planners and policy-makers make informed decisions on increasing travel satisfaction, especially for active modes of transportation such as bicycling, walking, and mass transit.

Objectives
The current research set out to examine the relationship between valence (i.e. sentiment) and travel mode using GSM. In doing so, it had two objectives: (a) visualise the spatial characteristics of active (walking, bicycling and mass transit) and non-active (automobile and water travel) travel mode densities using ESDA and geographic information systems (GIS) in two major metropolitan areas: Chicago and Washington, District of Columbia; (b) measure the associations between transportation mode and valence using a global and spatial model, while adjusting for neighbourhood character, time and weather. We believe this study contributes to two broader research areas: the visualisation of GSM derived human-scaled travel patterns, and understanding the links between travel mode and sentiment. The theoretical framework used in this research is based on the ecological modelling concept, which states that a combination of environmental and psychosocial variables can aptly explain physical activity, including transportation (Sallis and Owen 2002).
The remainder of the paper is structured as follows. Section 3 highlights previous works and Section 4 describes the study area and outlines the data sources. Section 5 introduces the methodology leveraged in this research, including Twitter sentiment analysis, GIS and modelling protocols. The paper concludes with describing the results and conclusion, in Sections 6 and 7, respectively.

Measuring travel behaviour
Considerable research has now established that travel has a positive utility and there are clearly links among travel mode, happiness and subjective well-being (Mokhtarian and Salomon 2001;Reardon and Abdallah 2013;Morris and Guerra 2015). The impetus for this work is derived in part from the field of social psychology, which states that behavioural choices are attributed to internal factors (Mokhtarian and Salomon 2001;Van Acker, Van Wee, and Witlox 2010). This evidence has resultantly exposed a problem with traditional travel-demand models: travel isn't necessarily a cost and may be a sought after event in its own right (Morris and Guerra 2015). Much of the past research on well-being and travel behaviour remains unclear, however. For instance, while the reduced health outcomes of commuting are well-documented (Novaco and Gonzalez 2009), other works have found benefits, especially concerning mental health (Redmond and Mokhtarian 2001). Olsson et al. (2013) found that people in three of the largest urban areas of Sweden were generally happy while commuting. The link between well-being and mode-shift has also received significant attention in the literature (Abou-Zeid et al. 2012). Considering that mass transit is generally the least satisfying transportation mode (Ory et al. 2004), and bicycling and walking the most satisfying, (Morris and Guerra 2015), additional information on why, when, and where travellers are most satisfied may be essential in fully understanding an efficient means to promote active travel.
Traditionally, assessing travel behaviour has been through the use of bottom-up approaches which engage participants directly. Questions such as: 'how safe was your journey' and 'how does it (i.e. travel) affect your overall life quality' are typical requests (Abou Zeid 2009;Zhou and Zhang 2016). The use of qualitative survey instruments are the predominant tool used to collect this information. For instance, Schafer (2000) distributed 30 surveys to ascertain the spatiotemporal patterning of human travel, and St-Louis et al. (2014) disseminated a large survey to faculty, staff and students at McGill University with the goal of examining travel mode happiness. The downfall with these methods are the high cost, labour intensity, limited sample size and their inability to adequately decipher complicated human behaviour (Steiger et al. 2015;Luo et al. 2016). Geo-located mobile communications have been looked at as new way to collect more accurate responses regarding travel behaviour, with the added benefit of locational attributes.

Geo-social media analysis
Location-based-social networks (LBSN) and GSM data have brought new insights into geographical science, spatiotemporal analysis of humans and social interactions (Giannotti and Pedreschi 2008;Sui and Goodchild 2011;Luo et al. 2016). Platforms such as Twitter, Facebook and Foursquare are the primary sources of GSM data (Huang and Wong 2016;Luo et al. 2016). Twitter, in particular, is a microblogging application that records text (280 characters or less), time and geospatial coordinates: allowing for linkages among place, time and semantics (Tsukayama 2017). The majority of previous travel behaviour research using GSM data has been dominated by investigations on human mobility (Cheng et al. 2011;Huang and Wong 2015), social activity patterns (Steiger et al. 2015;Huang and Wong 2016) and physical activity levels in neighbourhood contexts (Nguyen et al. 2016). For instance, Schweitzer (2014) examined how transit agencies and patrons engage with Twitter as a means of communicating service quality and Collins, Hasan, and Ukkusuri (2013) examined the utility of Twitter for aggregating transit rider opinions. While Twitter doesn't dynamically track movement, it can connect with Foursquare check-ins to link activity space (i.e. geography), sentiment and transportation (Noulas et al. 2012;Wu et al. 2014;Mondschein 2015). For example, Mitchell et al. (2013) used check-ins to investigate happiness levels obtained from Twitter throughout the US and Esmin, De Oliveira, and Matwin (2012) investigated sports-related sentiments using a similar approach. Quantifying the psychology of 'place' may show promise for understanding the probability of adopting new behaviours. Learning takes place in social context, where behaviours are adopted based on how it's performed, perceived and expected outcomes (Bandura 1977;Nguyen et al. 2016). The current research infers that combining social media sources like Twitter and Foursquare with geography could help analyse attitudes towards differing travel modes, and circumvent the pitfalls of traditional methods. One of the first steps in doing so requires the extraction of the implied opinions from social media. Sentiment analysis offers researchers the opportunity to link the opinions of users to geography . The task of mining data for sentiments is arduous due to the short length and irregular structure of the user-produced content (Saif, He, and Alani 2012). Resultantly, many methods have been devised and applied in several research areas such as: predicting daily box office revenues (Rui and Whinston 2011) or assessing short-term stock market performance (Bollen, Mao, and Zeng 2011). The sentiment analysis methods used have been both lexicon-based approaches and machine-learning algorithms. Das and Chen (2001) utilised a lexicon-based classification algorithm to extract market emotions from stock message boards, which was further used for decision on whether to buy or sell a stock. Lei et al. (2014) constructed a word list or dictionary for emotion detection. Similarly, Turney and Littman (2003) used an unsupervised learning algorithm to classify emotional content of users' reviews of movies, travel destinations, automobiles and banks. A recent machine-based classification algorithm that has been proven useful for analysing Tweets is LabMT. The method assigns happiness values based on 10,000 of the most popular English words . Other works which use linguistic structures or a dictionary of words to classify the emotions of words or phrases include SemEval, which introduced the task of 'Affective Text' (Strapparava and Mihalcea 2007), SWAT (Katz, Singleton, and Wicentowski 2007), Subjectivity Wordlist (Mihalcea, Banea, and Wiebe 2007), WordNet-Affect (Strapparava and Valitutti 2004) and SentiWordNet (Baccianella, Esuli, and Sebastiani 2010). Despite the limitations of semantic ambiguity, we used a Lexicon based approach in this research due to its simplicity of deployment and successful implementation in previous research (Kim et al. 2017).

Study area
The study areas in this research were Chicago, Illinois and Washington, District of Columbia (DC), USA (Figure 1). The locations were chosen because of the large number of Foursquare check-ins and differences in SES, climatic, intermodal transport options, walkability, bicycle-friendliness, and density of recreational opportunities such as the National Mall in DC and Lake Michigan in Chicago. Both cities contain multimodal transportation systems and are highly ranked nationally for their bicycle 'friendliness' . Bicycling Magazine recently ranked DC seventh in the nation and Chicago first (Dille 2016). Table 1 highlights additional commonalities and differences in regards to several SES indicators in each city.

Twitter data
The first step in this research was to retrieve GSM data from the Twitter and Foursquare API (application programming interface). We collected Tweets from April 2012 through June 2013. Foursquare was chosen because it is the most popular LBSN application and has surpassed 10 billion check-ins as of December 12, 2015 (Smith 2016). The Foursquare check-ins function as 'sensors' revealing the geography of human dynamics at different times of the day (Banerjee et al. 2013). The check-ins occur at the business locations through the Foursquare application using a person's mobile device (i.e. smartphone) and were retrieved from the broad 'Travel and Transport' category. In particular, we tracked consumers checking-in at transport locations such as: bus stations, train stations, hotels, piers, among many other transport related locales as these are major transportation destinations and origins within urban areas (Table 2). It should be noted that although there may be different reactions to differing mass transit modes (e.g. bus vs. subway), previous research has found that it is generally the least desirable travel mode (Schweitzer 2014), hence we treated all transit check-ins as one mode in this study. The final transportation related Tweets are depicted in Table 2 and their frequencies are illustrated in Figure 2.

Socioeconomic, environmental and weather variables
A large body of research has established connections among SES, environment and travel behaviour (Cervero and Kockelman 1997;Saelens, Sallis, and Frank 2003). To remain in-line with this work, several related factors were included in this research. We collected data from the 2010 US Census Bureau and the Environmental Protection Agency's (EPA) Smart Location database (SLD). The variables included in this data-set are comprised of 90 different SES and environmental correlates aggregated to the Census block group (CBG) level for the entire nation (Ramsey and Bell 2014). The factors were measured and categorised in accordance with known travel behaviour and quality of life metrics such as: development density, landuse diversity, network design, accessibility, demographics and employment. We considered all of these in this research.
Weather is a strong predictor of active travel behaviour and air quality is an ever-growing global public health concern (Bocker, Dijst, and Prillwitz 2012; Rybarczyk and Gallagher 2014). Therefore, daily weather and air quality data was collected from the US Environmental Protection Agency's Air Quality System (AQS) (https://www3.epa.gov/airdata/index.html) and a commercial entity: Weather Underground (https://www.wunderground.com/). The variables integrated into this research included: temperature, precipitation, wind speed, humidity and air quality (i.e. Ozone). We downloaded air quality data (i.e. AQS) for Chicago and Washington DC originally collected from seven to six monitor stations, respectively. Additionally, we gathered precipitation, temperature, humidity and wind speed from 19 Weather Underground stations in Chicago and 13 in Washington, DC.
Variations in time can affect travel behaviour and mobility patterns due to workplace and life circumstances (Yuan, Raubal, and Liu 2012;Järv, Ahas, and Witlox 2014). To explore how time impacted valence using Tweets, we used the date and time fields from this database for further categorisation and analysis. The groups consisted of: hour ranges, am/pm, day, month and season and were aggregated to ease interpretation of the relations between travel mode, valence, and time.

Methods: a multistage procedure
A series of sequential data analysis steps were required in the current analysis. Three main tasks outlined in Figure 3 were: (1) Mining and collecting the GSM posts for sentiment analysis, (2) geolocating Foursquare check-ins and merging them to neighbourhood level (i.e. CBG) factors using GIS, (3) explore the travel mode trends spatially using ESDA and implementing two types of regression models. The forthcoming paragraphs outline the steps involved in each task.

Data mining
We mined Twitter data from consumers who checked in to different Foursquare venues and then tweeted via the Foursquare application, in the two study cities. We invoked this approach to reduce the typical 'noisiness' of Big Data, obtain a sufficient sample, and reveal important spatial patterns, and associations to place. In order to illustrate how users were situated within specific travel modes, we will explain the data sharing process. When users check into a Foursquare venue and tweet, the exact latitude-longitude of the venue is passed on to Twitter which is then accessible via their application programming interface. We collected tweets from the API along with location details of the venues. The details captured included address, establishment categories such as a house, coffee shop, or movie theater. We collected the Tweet ID, Tweet text, time, and user ID. We accumulated Twitter check-in data (via Foursquare) within a 2.5 mile buffer around major locations such as Union Station in Chicago and DC(see Figure 1).
We considered three main types of Foursquare venues in this research. First, fixed establishments such as a bus station or train station were assessed. The second were transportation related land-uses such as roads or piers (i.e. Lake Shore Drive in Chicago) and the third type were broad categories like 'general travel' (Table 3). While the first was sufficient to indicate the travel mode accessed by a user, information from second and third categories were combined with tweet content to identify the travel mode used or considered. In other words, a person who checked in on an expressway often revealed they were in a taxi heading to the airport. A person checking in on the Navy Pier revealed they were having a nice walk. Accordingly, users were assigned to auto, bike, mass transit, pedestrian and water travel. In this research, approximately 5% of the Tweets collected in the area surrounding the reported location were using Foursquare check-ins. While this statistic is low, our sample included enough Tweets with valid geographic information and syntax to conduct statistical and spatial analysis. The final quality assurance step consisted of manually verifying the Tweets to ensure that the check-ins correctly represented the travel mode. The process entailed the comparison of the check-in travel groups with the Tweet content, as well a visual comparison against the most recent geospatial imagery and land use data in a GIS. Approximately 1% per cent of the Tweets were removed due to errors. We amassed a total sample size of 4399 and 3988 for Chicago and DC, respectively. To give an idea about how Foursquare Venue categories converted into travel modes; the percentage check-ins from each Venue type that were classified as different travel mode are displayed in Table 3.  As we explained above, venues that indicated establishments, such as Boat or Ferry docks, bus, train stations, light rails and subway universally indicated one form of travel mode (either water travel, or mass transit). However, the Pier, general travel and road check-ins could be auto, bike, mass transit, pedestrian and water travel. For example, 87% of the people who checked in on the 'road' were travelling in an auto and 13% were pedestrians. These travel modes were verified/determined from the tweet content using the above-mentioned methodology.
The current research used the popular lexicon based sentiment analysis algorithm entitled Affective Norms from English Words (ANEW) developed by Bradley and Lang (1999). The method uses a scale to classify all Tweets and derive an overall measure of pleasure score for each check-in. The ANEW scale is based on the assumption that some of the most common words in the English language elicit different levels of pleasure, arousal and dominance and the scale rates those words with a specific level of emotion. Using a list of 2,476 words, individuals responded to them using a self-assessment manikin (i.e. a non-verbal pictorial) that relates to measures of pleasure, arousal, and dominance. The process resulted in a numerical index for each Tweet based on the sum of the keyword scores. Tweet pre-processing was not conducted prior to sentiment analysis. Among the sentiments relevant to effects that the environment has on human emotion and behaviour, there are three as suggested by Mehrabian and Russell (1974) and Turley and Milliman (2000): pleasure, arousal and dominance. Pleasure is akin to valence, and is similar to the motion of happiness experienced while immersed in a physical environment. We utilised valence in the current research to assess travel mode sentiment (i.e. happiness). Table 4 lists the valence summary statistics for all travel modes in each city.

Geoprocessing and data preparation
From the Foursquare latitude and longitude coordinates of each Tweet, we first created a XY event layer in GIS and then exported them to a point shapefile using ESRI (Environmental Systems Research Institute) ArcGIS software version 10.4. A point-in-polygon geoprocessing method was used to intersect the Foursquare check-in and associated Tweet content with the US Census Block Group (CBG) polygons. By merging these two spatial databases, we could then examine and account for underlying neighbourhood conditions. The GIS procedure is commonly used when area scale data needs to be affixed to point scale data (Cromley and McLafferty 2012). Approximately 99% of the check-ins were successfully mapped and joined to the CBG's. Those which were spatially misplaced or erroneous were removed from the analysis. The previously discussed weather factors were joined to the individual Tweets using an attribute join procedure in GIS. During all modelling efforts, we considered valence as the dependent variable. Prior to model development we transformed this factor using the square root in order to bring it into normality. The primary independent variable in each model was travel mode. Dummy coding was applied and the automobile travel mode was the reference category. A correlation matrix was examined for statistical correlations among the dependent variable and all independent variables (i.e. SES, environment, weather, and time). Factors that were statistically significant (p < 0.05) and uncorrelated, using the variance inflation factor (VIF), were ultimately retained for model inclusion. A VIF of less than 10 was considered the threshold as values greater than this violates the parametric test of assumption of covariate independence (Shaker 2016). The same independent variables for each city were used during model construction. The final independent variables are displayed in Table 5.

Data analysis: spatial distribution of travel mode Tweets
To understand the spatial patterns of travel mode check-ins, we implemented a common ESDA technique, kernel density estimation (KDE) using ArcGIS software (see objective 'a'). The KDE method approximates the intensity of points per unit area from the creation of a smoothed surface using a kernel function and their distance from each point (Bailey and Gatrell 1995). The inputs are the x, y, z coordinates and outputs a raster model where the cells represent the density of events, often labelled as 'hot-spots' . The choice of the kernel has found to be not as important as the bandwidth (i.e. window width) in assessing the contribution of near-by points (Downs 2010). The KDE method has been suggested and used in several past studies visualising the spatial trends of GSM data ( To remain in-line with these works, KDE was used to transform travel mode check-ins to a continuous surface density. We selected a bandwidth of one kilometre, after multiple rounds of testing other radii, and a cell size of 30 metres. Although several kernel functions are available, the quadratic kernel function, as described in Silverman (1986), was used. The result is a raster surface depicting the spatial distribution of travel mode check-in densities in each city.

Global and spatial regression model development
The inferential statistical analyses of this study followed three steps: global multiple regression, spatial multiple regression and model accuracy assessment using a common autocorrelation index, Moran's I. Two global ordinary least squares regression models (OLS) were first constructed to explain the relationship between the five travel modes (i.e. auto, walking, bicycling, mass transit and water travel) and valence using SPSS (IBM Inc.) version 22 software. All explanatory variables (see Table 5) were entered simultaneously into each model. The standardised residuals from both models were tested for autocorrelation using Global Moran's I. This procedure was enacted because if spatial autocorrelation is present, the independence of observations assumption is violated (Wagner and Fortin 2005). The method produces an index ranging from −1 to +1. Statistically significant positive values indicate clustering and negative values indicate dispersion (Burt, Barber, and Rigby 2009). A simultaneous autoregressive regression (SAR) modelling methodology was employed to correct for spatial dependence among the response and predictor variables. This inferential statistical technique was chosen over other autoregressive methods, such as conditional autoregressive models, primarily because it has been proven effective in other transport related studies (Rybarczyk 2012). The SAR method is a spatial modelling technique that uses a variance-covariance matrix based on the non-independence of spatial observations (Kissling and Carl 2008). The model address spatial autocorrelation by estimating how much the response or predictor variable at any one site reflects the response or predictor values at surrounding sites; this is achieved by adding a distance-weighted function of neighbouring response values to the model's explanatory variables (Dormann et al. 2007). The reader is directed to Anselin (2013) for further details on the SAR model. The publically available software Spatial Analysis in Macroecology (SAM), version 4 was used for model development (Rangel, Diniz-Filho, and Bini 2010).

Results and discussion
The spatial distribution of travel mode densities were examined using KDE and GIS to visualise trends in each city first. This section is followed up by the results of the two independent multivariate global and spatial regression models. The global models summarised the directionality and degree of influence of travel mode on valence, while adjusting for SES, time, environment and weather; the SAR models analysed the same relationships, while accounting for spatial autocorrelation.

Travel mode visualisations
The initial objective ('a') in this research was to visualise the spatiality of travel mode densities using KDE, where the darker hues represent hot-spots of travel mode check-ins for each city. The highest density of automobile modes were found near recreational and natural resources (i.e. parks) in Chicago (Figure 4(a)); and adjacent to waterways and the National Mall in DC ( Figure 5(a)). Expectantly, the largest concentration of mass transit mode check-ins stood out near the main transit hubs in Chicago and Washington DC (Figures 4(b) and 5(b)). Figures 4(c) and 5(c) depict the density of water travel modes in Chicago and DC, respectively. The clusters were found near waterways and recreational land-uses. The densities of active transportation modes, walking and bicycling, in Chicago are shown in Figures 4(d) and 4(e). Pedestrian activity was highly concentrated near Navy Pier, which is logical considering that this area is largely designed for walking. At the same time, bicyclist activity was mainly situated downtown and adjacent to Millennium Park. We found that a hot-spot of pedestrian activity was linearly diffused near the National Mall in DC (Figures 5(d)). Figure 5(e) indicates a minute, however significant, hot-spot of bicycling activity in a neighbourhood north-east of Union Station. This area is adjacent to DuPont Circle, an area containing many bicycle and pedestrian facilities. Overall, the visualisations depict nuanced travel mode activities in each city with high spatial granularity.

Influence of travel mode on valence: global and spatial model results
Our second objective ('b') in this research was to measure the associations between transportation mode and valence, while accounting for known predictors of travel. We implemented two global and spatial models to carry this out. The F-statistics and p-values showed that each city's OLS model was statistically significant (p-value < 0.001); however, the explanatory power metrics (i.e. adjusted R 2 and AICc) showed that the models were weak (see Tables  6 and 7). The result of using 12 independent variables in the global models explained 7 and 4% of the variance (adjusted R 2 ) in Chicago and DC, respectively. The discouraging model diagnostics have been found in previous travel behaviour studies. St-Louis et al. (2014) examined travel mode satisfaction among 3377 commuters and found adjusted R 2 values ranged between .13 and .25 and in a similar study by Morris and Guerra (2015) the authors found OLS R 2 values between .001 and .139. The autocorrelation of the standardised residuals from each model were analysed using Moran's I. Positive autocorrelation was found among the residuals for each city (Chicago Moran's I = .012, p-value < 0.001; DC Moran's I = .03, p-value < 0.001), indicating spatial nonstationarity. Although marginal, the implication is that the model's estimation reliability is compromised and can reduce our understanding of the links among valence, travel mode and remaining explanatory factors. Tables 6 and 7 display the goodness-of-fit measures (i.e. R 2 and AICc) for predicting valence in each city. The implementation of the SAR models improved upon the OLS results and corroborated the initial findings. In Chicago, there was an approximate 1% decrease in AICc and a R 2 increase of 31% (Table 6). The SAR model results from DC echoed Chicago's; we observed an approximate 1% reduction in AICc and a 45% increase in R 2 was observed ( Table  7). The comparison of the AICc and R 2 values suggests the SAR models were examining a non-stationary feature in the relationship among the dependent and independent variables not addressed in the global OLS models. The ex post facto analysis of SAR standardised residuals from each model (i.e. Chicago and DC) using a Moran's I correlogram largely indicated spatial randomness ( Figure 6). The reader is referred to Appendix 1 for a detailed Moran's I non-graphical output of the SAR residuals. In sum, the SAR model outputs showed moderate improvement over the OLS diagnostics and the covariate's directionality and statistical significance remained consistent.
The coefficients and significance levels for all four model estimates are displayed in Tables 6 and 7. Due to the comparative strength of the SAR models, we highlighted the coefficients from these models. Across all models, water travel had the strongest positive influence on valence. The greatest impact was found in Chicago (SAR std. coeff. = 0.275, p < 0.001) and to a lesser degree in DC (SAR std. coeff. = 0.085, p = 0.002). The result leads us to believe that the trip purpose is likely for recreational purposes. We verified this by scanning the database   and discovering many Tweets indicating 'boat tour' . The elevated valences for this mode are plausible and confirm prior research which has found mental health benefits associated with exposure to nature in urban areas (Bakolis et al. 2018). Pedestrian travel universally showed positive impacts on valences and was statistically significant. This mode induced the highest valence in Chicago (Table 6), yet bordered statistical significance (SAR std. coeff. = 0.075, p = 0.094); the relationship in DC (Table 7) was also positive and held greater statistical significance (SAR std. coeff. = 0.062, p < 0.001). The positive relationship between walking and valence is aligned with prior works. For instance, Gatersleben and Uzzell (2007) found that walkers found their commute relaxing and later research from Duarte et al. (2010) posited that the exercise benefits of walking and bicycling outweighed the benefits of using the automobile. Table 7 showed that bicycling had a marginally positive affect on valence in DC (SAR std. coeff. = .047, p = 0.066). This finding is consistent with previous literatures and may also be reflective of the city's recent national 'bicycle-friendliness' ranking (St-Louis et al. 2014). Although bicycling positively affected valence in the City of Chicago as well, it was not statistically insignificant (p = 0.894) ( Table 6). The collective findings are especially encouraging on a broader scale. Given that walkable and bikeable neighbourhoods promote social capital and increase health (Rogers et al. 2011;Celis-Morales et al. 2017), practitioners should be encouraged to use GSM data for the promotion of these modes, especially in marginalised communities. For instance, government officials may be able to record bicycling and walking satisfaction in real-time to gauge travel and neighbourhood happiness: a suggestion that has been advanced for elevating transit service quality (Collins, Hasan, and Ukkusuri 2013). The majority of SES, environmental, weather and temporal factors had an adverse impact on valence in each city (Tables 6 and 7). Across both SAR models, weather was negatively associated with valence. Temperature was statistically significant and had a marginal effect on valence in Chicago (SAR std. coeff.=−0.056, p = 0.003); humidity was also slightly influential, however statistically less significant (SAR std. coeff.=−0.028, p = 0.094). In DC only temperature (SAR std. coeff.=−0.087, p < 0.001) was significant. The findings here recall past research emphasising the linkages among temperature, humidity, travel behaviour and sentiment (Miranda-Moreno and Nosal 2011;Hannak et al. 2012;Saneinejad, Roorda, and Kennedy 2012). Additionally, the outputs here are noteworthy as they should assist stakeholders make improved decisions on how to promote active transportation modes while considering weather. For instance, providing air conditioning on mass transit trains or busses may incentivise their use during extreme heat events and incorporating weather factors into travel-demand models may better predict timely (i.e. hourly, daily, etc.) city-wide walking, bicycling and mass transit rates. The morning temporal explanatory variable possessed a marginally negative association to valence and bordered statistical significance (SAR std. coeff.=−0.028, p = 0.062) in Chicago (Table 6). We can infer that this time-frame reflects the morning commute, and unsurprisingly, has been previously associated with elevated stress (Legrain, Eluru, and El-Geneidy 2015). Therefore, interventions such as congestion pricing or educational campaigns focused on increasing mode shifts to bicycling or walking, should be looked upon as ways to increase sentiments during this time. The density of street intersections showed a statistically significant, yet marginal, downward effect on valance in DC (SAR std. coeff.=−0.083, p = 0.034) ( Table 7). The relationship indicates that an increase in urban density results in a lower valence (i.e. sentiment). This result is corroborated by past works; for instance, Melis et al. (2015) posited that urban density and antidepressant drug use were positively correlated, and Rocha et al. (2012) found that stress and aggression were linked to traffic noise and pollution. Our results have added to these works, and suggest that GSM data could be used to ameliorate these effects by providing real-time geolocated sentiments on where, and to what extent, elevated congestion and noise levels are occurring for timely mitigation strategies. Additionally, minimising these externalities could have positive implications for increasing walking and bicycling, as recent research has posited that these modes are used less when congestion related air pollution is elevated (Li and Kamargianni 2017).

Conclusion
The proliferation of geolocated social media data has provided additional channels to examine human travel behaviour. In this paper, we made two significant contributions. First, we used exploratory spatial data analysis (i.e. hotspot mapping) to survey the spatiality of travel mode trends in two major US cities. Second, we combined GSM data with neighbourhood indicators and extracted emotions from their written posts to understand their states of mind while traveling using global and spatial models. All our above contributions provide actionable insights and tools for policy-makers to assess and enact efficient plans and policies set on reducing commute stress and promoting active transportation modes.
The results of the ESDA assessments illustrated that there were distinct spatial patterns of travel mode densities in each city using a common visualisation technique, KDE (see objective 'a'). We discovered interesting relationships between the water and automobile travel modes, and their proximity to natural resources. Noteworthy visualisations were also observed for walking and bicycling in each city. The spatial patterns of the mode check-ins indicated that these activities were occurring near tourist destinations, parks, and pedestrian/ bicycle friendly neighbourhoods. The outputs lend credence to the utility of using GSM data to understand transportation phenomenon at a high spatial resolution. Policy-makers and planners could benefit from the results by inventorying the conditions in these areas and developing context-sensitive strategies to encourage a mode-shift from the automobile in neighbourhoods where bicycling and walking is low.
Furthermore, in addressing our second objective ('b'), the current study found significant inferential relationships between travel mode and valence measures. Interesting associations were also observed among valence levels, environmental and temporal attributes. We implemented the global OLS model as a base model. The SAR model proved superior in both cities and produced noteworthy coefficients. An important finding was that elevated valence scores were universally associated with water and pedestrian modes; along with bicycling, albeit in DC Our findings lend additional evidence that travel may in of itself be a derived demand (Mokhtarian and Salomon 2001). Additionally, if a low-stress mode is utilised, which in this study included water travel, walking and bicycling-it may possess a higher probably of being utilised again, thus reducing automobile usage and its associated externalities. Understanding when, how and why stress is reduced, therefore, is essential for encouraging such mode choices in other areas. However, the evidence found in this research tells us that this is not a straightforward task. The results point to the degree in which valence was affected by time, (i.e. the morning commute), weather (i.e. humidity and temperatures), and environmental variables (i.e. urban density). From a policy perspective, these factors need to be accounted for, especially if GSM data is going to be used to quantify travel mode satisfaction in real-time. Overall, the outcomes represent a broader call for municipalities to utilise GSM to monitor and address public concerns in a timely manner through short-term and long-term planning strategies. As an example, real-time monitoring of human sentiments from Twitter could be used to proactively monitor mass transit service provisioning or weather induced human stress. While strategies such as this deserve attention, the use of GSM requires acknowledgement of certain limitations.
There are two main concerns regarding using GSM data, of which one is selective sampling. Smith and Brenner (2012) showed that only 15 % of Twitter users are adults and the groups that Tweet mostly consist of: young adults, African Americans, urban/suburbanites and those with smartphones. Despite past research linking environment to place (Bargh and Chartrand 1999), we must be cognizant of the limitation of GSM data in that check-in locales and Tweets may not always coincide. In other words, people may be Tweeting about their experience much later or geographically further than where they were when the thoughts were manifested. In this study, we merged the Tweets with area level census data based on where they were sent. Therefore, caution should be noted when statements about the linkages among Tweets and underlying factors are made. A final limitation in the current research is there are likely other factors not captured which could contribute to valence fluctuations. For instance, travel purpose (i.e. recreation versus work) and the level of social interaction were not measured and may influence travel mode sentiment (Olsson et al. 2013;van den Berg, Sharmeen, and Weijs-Perrée 2017).
These limitations notwithstanding, the overall findings of this work point to a convincing argument for the integration of GSM data in studying dynamic travel mode sentiments for reducing commute stress and increasing the pleasure of traveling, especially using active transportation modes. Considering the proliferation of this data, understanding how pleasure or displeasure (i.e. valence) affect travel behaviour remains a fast growing research area. Therefore, a follow-up study utilising more progressive sentiment algorithms such as LabMT or Probabilistic Latent Semantic Analysis (PLSA) will be considered for the next step in this research. In addition, a stronger time-series based predictive model, such as Granger Causality analysis, will be considered in a forthcoming study. It is hypothesised that stronger analytical frameworks such as these will provide more robust predictive capabilities and temporal accountability when GSM data is used. Ultimately, the present study has provided a first step in understanding the intimate geographical connection between travel mode and sentiment, using easily obtainable GSM data, and a transferable methodology which will incite others to acknowledge and bring into practice.

Disclosure statement
No potential conflict of interest was reported by the authors.