A parallel set-based model on the shortest travel time in long-distance transportation systems

Abstract The shortest travel time in long-distance transportation systems (LDTS) is a key indicator for measuring regional connectivity and mapping accessibility at national and global levels. For traditional methods, it is a great challenge to calculate the shortest travel time with millions of origin-destination pairs and evaluate the overall performance. To fill this technique gap, this study proposed a novel model to get all the trips with different numbers of transfers through a series of set-based methods and then calculate all the shortest travel time between stops. The set-based model was tested for calculating the shortest travel time in LDTS of China. Using the optimization algorithm, we can significantly reduce the number of transfer trips that need to be calculated. For instance, the number of transfer trips decreased by 96.17%, 98.15%, and 79.02%, for conventional railway, high-speed railway, and air transportation, respectively, when the number of transfers was two. The set-based model can be extended to calculate the door-to-door travel time between places, for instance, mapping the fine-scale accessibility at the national level. Furthermore, we proved that this set-based model, proposed in this study, could also be parallelized and applied to any other LDTS in the General Transit Feed Specification format.


Introduction
Calculating the shortest travel time is a necessary step in spatial analysis and modeling, which usually quantifies the travel costs (Brainard 1999), intercity connectivity (Yue et al. 2019), accessibility (Zhang et al. 2018), etc.With the development of longdistance transportation systems (LDTS), more and more people can access services (e.g.tourism) and job opportunities across regions.Therefore, numerous studies attempted to compute the shortest travel time among regions, assessing the spatial structures of socioeconomic activities (Duan et al. 2020, Cheng andChen 2021) and their impacts on human behaviours (Kwan et al. 2018, Wang et al. 2020a).
Many studies have focused on the shortest travel time between cities (Yue et al. 2019, Duan et al. 2020), but there is currently a growing need to calculate travel time at a more detailed scale (Barbieri and Jorm 2019, Masiano et al. 2019, Zhang et al. 2023).This can be largely attributed to the inequality of the urban transportation infrastructure.Therefore, many studies selected to use the door-to-door travel time, which includes the travel time between transportation stops and the travel time between the transportation stops and the residential settlements (or the points of interest).For instance, Zhao and Yu (2018) investigated the accessibility advantages of the Dallas-Houston HSR route at the sub-city level under the competition of existing travel modes, revealing the spatial pattern of accessibility variation across the city.Nevertheless, most of these studies are limited to the city and the regional scales (Lin et al. 2019, Wang et al. 2020, Tian et al. 2022).At larger scales, such as the national and the global scales, some studies preferred to abstract the LDTS into the graph model, obtaining the shortest travel time through the shortest path algorithms (Wandelt et al. 2017, Sun et al. 2018, Weiss et al. 2018, Yue et al. 2019).However, this simplified model could overlook the quality of LDTS.The services of the LDTS differ from those of urban transportation systems in their lower frequency and less convenient transfers compared to urban transportation systems (Moyano et al. 2018).Consequently, the graph model could overestimate the shortest travel time, potentially exacerbating the concealment of spatial inequality in accessibility.Hence, there is a need for more accurate and fine-scale approaches for the shortest travel time in the LDTS to deepen the understanding of spatial evolutions, such as individual activities and industrial layouts.
This study proposed an innovative model that employs a series of set-based methods to calculate the shortest travel time among the LDTS with the standard General Transit Feed Specification (GTFS) timetable information.After that, the set-based model was extended to compute the door-to-door travel time by integrating the travel time between the stops (e.g.stations and airports) and the places (e.g.residential settlements and the points of interest).Given that the set-based model is designed to deal with independent trips, it could easily distribute independent trips among multiple central processing unit (CPU) cores for parallel processing.The following section is the literature review.The third section describes the principles and algorithms of the set-based model.The fourth section shows the experimental procedures on three LDTS systems in China (i.e.conventional railway, high-speed railway, and air transportation).In the fifth section, the set-based model was applied to map the national tourism accessibility at the town level.The sixth and seventh sections are dedicated to the discussion and conclusion, respectively.

Literature review
Previous studies have sought to build up the graph-based model of the physical LDTS network and determine the shortest travel time by setting operating speeds and shortest path algorithms (Wandelt et al. 2017, Sun et al. 2018, Weiss et al. 2018).For example, Weiss et al. (2018) applied fixed speeds to rasterize the road and railway networks and utilized the Dijkstra algorithm to measure the travel time to cities on a global scale.Sun et al. (2018) incorporated different speed profiles for routing vehicles from the project Open Source Routing Machine (OSRM) to assess the potential competitiveness of on-demand air taxis in Europe.However, these studies did not account for the temporal attributes and transfer processes of LDTS services (Moyano et al. 2018), which may lead to an overestimation of travel time.Some other researchers established the time-expanded (Pyrga et al. 2008) and the time-dependent (Stølting Brodal and Jacob 2004) models to include the time information in the graph-based model.However, this would significantly increase the size of the graph and the complexity of the model.The time-expanded model introduces all the events in the trips that increase the number of vertexes and edges of the graph (Pyrga et al. 2008).Whereas the time-dependent model can reduce the number of vertexes by grouping several trips along edges with time-dependent functions, it requires complex query algorithms to process transfers (Delling et al. 2015).
To better accommodate the graph-based model, research attempted to utilize nonrelational databases, such as graph databases, to enhance operational efficiency (Fortin et al. 2016).As graph databases (e.g. the Neo4j) are not routing engines, they could be more suitable for local traversal queries (e.g. in social networks) rather than full graph traversal queries in larger transportation networks (Miler et al. 2014).For instance, V� agner (2021) changed the data structure to support the k-shortest paths but found that it could cause out-of-memory errors in some cases.Consequently, existing graph-database studies are mainly focused on the feasibility of graph databases and their application in small regions (Maduako et al. 2019, Seula andTao 2023).
Besides, the number of transfers is also an important criterion (Delling et al. 2015, Kujala et al. 2018).Many studies focused on direct trips (Cao et al. 2013, Wang et al. 2013) or direct and single-transfer trips (Duan et al. 2020).While some enhanced Dijkstra algorithms make impressive progress in multi-criteria search for the road network, it requires complex speed-up algorithms and pre-processing time when it comes to real timetable information systems (Pyrga et al. 2008, Annabell Berger et al. 2009).Since the Dijkstra algorithm is sequential and needs to consider the impact of the new vertex on the overall sequence in each iteration (Jasika et al. 2012), it also may not be suitable for parallel computing.Therefore, for the travel time of millions of origin-destination (O-D) pairs, the Dijkstra algorithm requires complex computation or a large amount of query time (Alves, Krishnakumar, and Garg 2020, Meyer and Sanders 2003, Jasika et al. 2012).
Instead, Delling et al. (2015) proposed a different model, named the Round-Based Public Transit Optimized Router (RAPTOR).The basic idea of the RAPTOR model is to scan trips according to the number of rounds and then traverse each scanned trip.While the RAPTOR model can provide one-to-many results that increase computational efficiency (Conway et al. 2017), it focused on the earliest arrival problem with different rounds.However, the setting of trips in LDTS is sometimes different on the same route and the travel time of the later train/flight may be shorter than the former one.Therefore, the earliest arrival time may not be equal to the shortest travel time.
Apart from abstracting LDTS as graphs, LDTS can also be viewed as a set of different trips, each of which can be regarded as a set of stops (Delling, Pajor, and Werneck 2015).Therefore, the travel time between all the stops can be obtained by calculating the set of pairwise combinations of stops.This study provided a series of set-based methods to get the transfer trips with different numbers of transfers, and then the set of travel time between stops of each trip.For instance, the transfer trip can be expressed as the union of the subsets of two direct trips with different numbers of transfer stops, and the set of pairs of stops in a trip can be generated using the Cartesian product.While there are numerous transfer trips due to the increase in the number of transfers, the set-based model employed an optimization algorithm to reduce the number of transfer trips that need to be calculated.In addition, the calculation of door-to-door travel time can be seen as the step-by-step Cartesian products of the set of travel time between stops and places and the set of the shortest travel time between stops.

A set-based travel time model
The set-based model works on a timetable with trips.Each trip T u has a unique identifier (trip_id) u. S i,u represents the stop S i of the trip T u , where i is the identifier (stop_ id).q i,u (stop_seq) represents the sequence of S i in T u , for example, the q i,u of the initial stop is 1.Each S i,u has the arrival time (arr_time) and departure time (dep_time).For the initial stop and the final stop, their arrival time is equal to their departure time, respectively.The transfer trip indicates that the travelers can transfer from one trip to another, only when the time interval between the arrival time of the previous trip and the departure time of the latter one is greater than the minimum transfer time.(See Supplemental material for detailed examples).
The basic idea of the set-based model comes from the calculation of the travel time between all the stops in a trip.As mentioned before, the general practice is to iterate over stops from S 1,u to S n-1,u and calculate the travel time from each stop to its subsequent stops.This requires n-1 iterations.However, from the set perspective (Pinter 2014), the trip can be also regarded as a set of stops (Eq.1), and thus, the Cartesian product of the trip T u and itself is the set of all ordered pairs of stops (Eq.2).
Where U direct is the travel time between all the stops in the direct trip.Due to the direction of the trip, the travel time between stops is a subset of the Cartesian product (i.e.q i,u < q j,u ). Figure 1 shows that our set-based model will generate many-tomany pairs of stops without an iterative process, thus allowing us to calculate all the travel time between stops in a trip.That is to say, if we can get all the direct trips and transfer trips, we can calculate the travel time between all the stops.

Calculation of transfer trips
Besides the travel time of direct trips, the complexity of transportation systems is rooted in the numerous transfer trips, which depend on different transfer links.The transfer link is defined as a sequence of transfer stops (Eq.3), and the transfer trip T v can be regarded as the union of the subsets of two direct trips and the transfer link (Eq.4).Therefore, the travel time in the transfer trip can be donated as the set of all ordered pairs of stops (Eq.5).
, mÞ, q i, v < q i1, v , and q j, v > q im, v g (5 Where m is the number of transfers, S i1,u1 is the first stop of the transfer link, S im,umþ1 is the last stop of the transfer link, and T v represents the transfer from T u1 to T umþ1 .Figure 2(a) and (b) describe the generation of transfer trips when the number of transfers equals one and two, respectively.In Figure 2(a), trip T 1 can transfer with trip T 2 via the transfer link In this study, we defined the transfer links between direct trips as the transfer index (Eq.6), which documents the transfer links between each direct trip and other direct trips.Therefore, the single-transfer trips can be directly calculated by the transfer index, and the transfer links of double-transfer trips can be calculated by the Cartesian product of the transfer index and itself (Eq.7).And so on.At last, we got all the transfer trips with different numbers of transfers.
For the transfer index, our study also provides a set-based method.Specifically, we queried all the arrival time and departure time of each stop.For example, there are three trips passing the same stop.Transfer stop S i,u represents the stop S i in the direct trip T u and the set of transfer stops of the stop TS i ¼ fS i,u j u ¼ 1, … , rg.Using the Cartesian product of TS i and itself, we can get the set of transfer links of direct trips (Eq.8).Where TS i is the set of transfer stops of the stop S i , S i,u1 is the stop S i in trip T u1 , S i,u2 is the stop S i in trip T u2 , and r is the number of trips.In this study, we focused on the transfers on the same day and excluded the transfer links with time intervals less than the minimum transfer time.The time interval is equal to the difference between the departure time of the last transfer stop (e.g. S 2,2 in L(S 2,1 , S 2,2 , 1)) in the transfer link and the arrival time of the first transfer stop (e.g. S 2,1 in L(S 2,1 , S 2,2 , 1)).The minimum transfer time should include the drop-off time and the walking time at the transfer stop.While many researchers used the walking time as the transfer time between different stops in intra-urban transportation systems (Aleta et al. 2017, Kujala et al. 2018), this study focused on transfers within the same stop, and the minimum transfer time of conventional railway, high-speed railway, and air transportation is 30 minutes, 30 minutes (Wang et al. 2016), and 60 minutes (Liao et al. 2022), respectively.This is because, in general, the stops of LDTS are larger than those of intra-urban transportation systems and the distance between stops is farther.Meanwhile, the initial stop of the trip (e.g. S 1,u ) cannot be regarded as the first transfer stop of a transfer link and the final stop of the trip (e.g. S n,u ) can also not be regarded as the last transfer stop of a transfer link.Finally, the transfer index of each direct trip can be separated into independent files, each of which documents the transfer links with other direct trips.

Optimization of transfer links
The increase in the number of transfers not only increases the number of transfer links but also the length of transfer links.More precisely, the length of the transfer link is twice the number of transfers.We have noticed that when the number of transfers is greater than one, many different transfer links lead to the same transfer trips.shows three types of transfer links as follows (See Supplemental material for the pseudo-code): 1.The transfer links have the same first and last transfer stops.In Figure 3(a), for instance, the transfer trips are the same with the transfer links L1 and L2.Therefore, we removed the duplicated transfer links with the same first and last transfer stops.2. The transfer links have the same last transfer stop and their first transfer stops are from the same trip.In Figure 3(b), for instance, the travel time of the transfer trip with transfer link L5 includes those of transfer trips with L3 and L4.Therefore, our optimization algorithm tried to select the transfer link with the highest sequence of the first transfer stop.3. The transfer links have the same first transfer stop and their last transfer stops are from the same trip.In Figure 3(c), for instance, the travel time of the transfer trip with transfer link L6 includes those of transfer trips with L7 and L8.Therefore, our optimization algorithm selected the transfer link with the lowest sequence of the last transfer stop.
While such improvements can efficiently reduce the number of transfer links; however, it should be noted that the overnight trips could affect the transfer trips.For instance, in Figure 3(c), if the S14 is on the first day of Trip 5, while S16 is on the second day, the journey from S1 to S25 in the transfer trip with L6 will span across two days, however, the same journey in transfer trip with L8 may not span across two days.Therefore, in this case, excluding transfer link L8 could overestimate the travel time from S1 to S25.That is to say, we need to select the transfer links on the same day before the improvement process.
This study focused on transfer trips with the number of transfers less than 3.This is because, firstly, direct and single-transfer trips constitute a higher proportion of LDTS.Therefore, the shortest travel time of these trips could exhibit a significant impact on social-economic activities (Cao et al. 2013, Wang et al. 2013, Duan et al. 2020).Secondly, the increase in the number of transfers will decrease the travel experience of passengers (Liu et al. 2021).Therefore, the main objective of routing planning is to minimize both travel time and the number of transfers, and some studies restricted the maximum number of transfers to two (Fu et al. 2015, Wang et al. 2020b).

Shortest travel time
This study calculated the travel time between stops in all the direct trips, single-transfer trips, and double-transfer trips.For the transfer trips, we calculated the travel time from the stops of the first trip to those of the last trip in the transfer link.To calculate the shortest travel time, all the travel time were sorted from low to high, and the duplicate travel time with the same identity field (i.e. the 'departure stop-arrival stop') was eliminated.The above process can be programmed through the sorting algorithm and the de-duplication algorithm (sort_values and drop_duplicates) from the Pandas library).

Door-to-door travel time
Not only the travel time calculation in LDTS but also the set-based method can be used to calculate the door-to-door travel time.The door-to-door travel time is defined as the sum of the travel time between stops and the travel time between stops and places (Wang et al. 2016; Wang and Duan 2018) (Eq.9).From the set perspective, the door-to-door travel time can be expressed as follows (Eqs.10-13).
Where t x, s i is the travel time between the place of origin and the initial stop, t s j , y is the travel time between the final stop and the place of destination, t s i , s j is the shortest travel time between stops, C x and C y is the catchments of places, Shortest ij is the set of shortest travel time in LDTS, and p is the number of places.In this study, we requested the shortest driving time from the Amap services to get the t x, s i and t s j , y : Amap, launched by Gaode Maps, is a leading provider of digital map content, navigation, and location-based solutions in China.Given that one place could be served by multiple stops and the limitation of the Amap services, we first created a catchment for each stop with a radius of 200 km (Mao et al. 2015) and then selected the driving time of less than 2 hours.While the check-in time (t in ) and the check-out time (t out ) are largely dependent on the physical sizes and traffic volumes of the facilities, most airlines require passengers to arrive at least 60 minutes before departure for domestic flights, and high-speed trains usually require passengers to arrive 30 minutes in advance at train stations (Zhao and Yu 2018).This study set the check-in time to 30 minutes for stations and 60 minutes for airports (Zhao and Yu 2018).For the checkout time, we set the same values due to the large physical sizes and high traffic volumes of the facilities in China.Additionally, exiting the flight and claiming the luggage takes a long time.Figure 4 describes the set-based method for the shortest door-todoor travel time.The catchment file records the travel time between the stops and places as well as the check-in and check-out time.In Figure 4, for instance, the origin catchment indicates that there are two stops (S1 and S2) near place P1, and the shortest travel time documents the travel time between stops from the timetable, e.g. the travel time from S1 to S3 and S4.Therefore, it can first calculate the travel time from P1 to S3, S4, and S5 by merging the origin catchment and the shortest travel time.Secondly, the door-to-door travel time from P1 to P2 and P3 can be obtained by merging with the destination catchment.Finally, using the sorting algorithm and the de-duplication algorithm, our model outputs the shortest door-to-door travel time.

Parallelization
Given that most of the work is spent dealing with independent trips, the set-based model can be implemented in parallel.Our study provides a suite of parallel tools, including the travel time calculation in trips, the calculation of transfer links, the selection of the shortest travel time, etc.In this study, all experiments were done on a dual 26-core Intel Platinum 8169 machine (52 cores total) clocked at 2.5 GHz, with 512 GB of DDR4 random access memory (16 � 32 GB, 2666 MHz).All the algorithms are implemented in Python (with the Parallel Python library for parallelization).

Experiments and comparisons
In this section, we applied the set-based model to compute all the shortest travel time within three different LDTS systems in China-conventional railway, high-speed railway, and air transportation)-and compared it with the time-expanded model.Unlike many studies that focused on comparing the efficiency of specific queries (e.g.1000 queries), this study concentrated on the travel time for the entire LDTS, involving millions of paths.Compared to other models, on one hand, the time-expanded model is more robust than other models in both modeling and results aspects (Pyrga et al. 2008(Pyrga et al. , V� agner 2021)); on the other hand, it presents a more versatile structure to integrate GTFS data (Fortin et al. 2016).Specifically, this section first introduces the GTFSbased datasets for the LDTS.Then, this study evaluated the efficiency of the optimization algorithm and the parallel computing.Finally, it presents comparisons of the shortest travel time and the runtime between the set-based model and the timeexpanded model.

GTFS-based dataset
Given that the schedules of LDTS vary according to their different sources, our model is developed according to the GTFS-based dataset.The GTFS is a universal data specification that is widely used for transportation science and applications (Kim and Lee 2019, Pasha et al. 2020).A GTFS dataset generally includes seventeen CSV files to document the general information about transit agencies, stops, routes, schedules, etc (https://developers.google.com/transit/gtfs).Here, we concentrated on four objects as follows: � Stops: stops document the identifiers and the geographic locations of stations and airports.� Trips: trips document the identifiers of trips.� Stop times: stop times are associated with a trip and document the run-time information of each stop in a trip.In LDTS, there commonly exist overnight trips (e.g. a train departures at 19 p.m. and arrivals at 8 a.m. on the second day) that could lead to miscalculating the travel time.Therefore, our study also calculated the cumulated arrival time and cumulated departure time from the initial stop of the trip.� Transfers: transfers record the minimum transfer time between stops.
In this study, we crawled all the schedules of conventional railway, high-speed railway, and air transportation from the Internet in 2020.The schedules of conventional railway and high-speed railway were obtained from a public website (http://www.huochepiao.com)and the schedules of air transportation were obtained from another website (http://www.ctrip.com).Each schedule was documented in a single HTML file and then parsed into a timetable.There are eight types of passenger train services of which the 'C', 'D', and 'G' are high-speed trains.In total, we collected 2,660 schedules of conventional trains and 7,155 schedules of high-speed trains.For air transportation, there are 26,123 schedules of domestic airlines.All these schedules include 3,526 stations and 243 airports, and the coordinates of these stops are obtained by the Amap geocoding services (https://lbs.amap.com/api/webservice/guide/api/georegeo).Since there could be two or more names for the same stop, we repeatedly recorded the information of these stops with the same unique id name 'stop_id'.

Optimization of transfer links
While the number of transfer links increases as the number of transfers increases, our optimization algorithm can significantly reduce the number of transfer links or the number of transfer trips that need to be calculated (Figure 5).Compared to the number of transfer links without optimization, the number of transfer links with optimization has decreased by 96.17%, 98.15%, and 79.02%, for conventional railway, highspeed railway, and air transportation, respectively.This will significantly reduce the overall running time; for instance, the running time with optimization for conventional railway is 85 minutes, whereas the running time without optimization is more than 11 days.Meanwhile, we also compared the shortest travel time before and after optimization, and the results were consistent.This further validated our model.It should also be noted that there are some differences in the selection of the shortest trips, as different trains/flights may result in the same travel time.

Parallel computing of the set-based model
We also compared the computational efficiency of the parallel set-based model.Given that the single-core set-based model requires a lot of running time, our study started with the number of CPU cores set to 13. Figure 6 shows that the increase in computational efficiency is lower than the increase in CPU cores.From 13 cores to 52 cores, the running time of conventional railway, air transportation, and high-speed railway decreased by 53.80%, 19.59%, and 54.43%, respectively.Although conventional railway has the greatest number of stops, its efficiency is the highest, possibly due to fewer trips.In addition, transfers can also affect efficiency.This can be seen by comparing high-speed railway and air transportation; transfers in high-speed railway could be more complicated than those in air transportation.

Comparison with the shortest travel time in the time-expanded model
We attempted to contrast our results with the shortest travel time of the timeexpanded model (Pyrga et al. 2008).The time-expanded model includes three types of nodes (i.e.arrival nodes, departure nodes, and transfer nodes), and the shortest travel time between nodes can be calculated by the Dijkstra algorithm.As mentioned before, the Dijkstra algorithm requires complex speed-up algorithms and pre-processing time to calculate the shortest travel time taking into account the number of transfers, however, this study attempted to compute all the paths between stops and selected the shortest travel time with different numbers of transfers.Unfortunately, due to the extensive number of paths in the real and lattice-like LTDS, calculating all paths could lead to memory leaks.As an alternative, we compared the shortest travel time when the number of transfers is less than three, and the proportion in conventional railway, high-speed railway, and air transportation is 46.24%, 72.39%, and 93.85%, respectively.
To quantify the difference in the results of the two models, our study employed two metrics, namely, the mean absolute error (MAE) (Weiss et al. 2018) and root mean squared error (RMSE), respectively.The MAE is a linear score, meaning that all individual differences are weighted equally in the average; however, the RMSE gives a relatively high weight to large differences.Figure 7 shows that the MAE and the RMSE of high-speed railway are the lowest, followed by air transportation and conventional railway.We found that when the number of transfers is less than two, the results of the two models are the same, however when the number of transfers is equal to two, the travel time of the set-based model is higher than that of the time-expanded model.This is because our model mainly considers transfers within the same day, while the transfer trips of the time-expanded model may span multiple days, i.e. the transfers happen on different days.We examined the paths in which the travel time in the setbased model is greater than the time-expanded model.The results show that almost all the transfers happen in two consecutive days, with only 1.63% of those paths spanning more than two days in conventional railway.For high-speed railway and air transportation, all the intervals between transfer stops are less than one day (1440 minutes), however, 38.76% of transfer trips in conventional railway have intervals exceeding one day.
Furthermore, we attempted to evaluate the travel time when the transfer interval is less than or equal to 1440 minutes.Figures 7(d-f) demonstrate that both the MAE and the RMSE significantly decreased, and there is no difference between the set-based model and the time-expanded model in high-speed railway and air transportation.For the conventional railway, most of the overestimated trips in the set-based model have a travel time greater than 1440 minutes.Only one transfer trip has a travel time of less than 1440 minutes.This trip is a circular trip that transfers from 'K1315/K1318' (Table S7 in Supplemental Material) to 'Z126/Z127' (Table S8 in Supplemental Material) and  then transfers back to 'K1315/K1318' again.In Table 1 and Table S7, passengers cannot travel directly from S1905 to S1641 by train 'K1315/K1318', however, they can transfer through train 'Z126/Z127' to reach S1644 and then proceed to S1641 by another train 'K1315/K1318' on the next day.This implies that under the condition of a one-day timetable, the time-expanded model could include partial results of a two-day timetable, whereas the set-based model strictly adheres to this condition.

Comparison with the running time in the time-expanded model
Considering that the time-expanded model may lead to memory leaks and requires complex algorithms, this study attempted to compare the running time to calculate the travel time of direct trips, that is, there are no transfer edges in the time-expanded model.Figure 8 demonstrates that, for conventional railway and high-speed railway, the running time of the set-based model is shorter than the time-expanded model, while the opposite is true for air transportation.This is primarily due to the fewer stops in air transportation with a higher number of flights (i.e.direct trips).Nonetheless, the time-expanded model requires pre-processing for network generation.The pre-processing time for conventional railway, high-speed railway, and air transportation are 38 minutes, 111 minutes, and 41 minutes, respectively.It should also be noted that we achieved parallel processing for the time-expanded model through a multitasking approach, with each core storing the entire network.This implies that the time-expanded model would occupy more memory.

Applications
Existing literature generally measured tourism accessibility by calculating the travel time between cities (Weng et al. 2020, Wang andLu 2022).It is still difficult to get the finer-scale accessibility, for instance, the travel time of each residential settlement to all the scenery sites (i.e.AAA, AAAA, and AAAAA scenery sites in China).In this study, our method calculated 345,808,041 shortest door-to-door travel time from 41,260 town-level divisions (hereafter as reference divisions) to 9,339 scenery sites through LDTS.Specifically, the travel time between divisions/scenery sites and stops was  obtained from the Amap services and the shortest travel time between stops includes conventional railway, high-speed railway, and air transportation.We found that 69.69% of the travel time is less than 12 hours.Figure 9(a) shows the spatial distribution of the number of scenery sites at the town level, and Figure 9(b) is the market size of the scenery sites.The market size here is simply defined as the total population covered by the scenery site in 12 hours.We found that the accessibility of the eastern region is significantly higher than that of the western and north-eastern regions, and also the tourism market in the eastern region is the largest.However, there are still some divisions with low accessibility and scenery sites with low market potential.This implies that inconvenient transportation could hinder the development of these divisions and the scenery sites.Our maps could not only reveal the fine-scale spatial pattern of tourism accessibility but also could be applied to micro-level studies, such as the questionnaire study in the special scenery site.

Discussion
This study proposed a set-based model to calculate all the shortest travel time between stops/places.Compared to the one-to-one models, our study provides a many-to-many solution that avoids iterating over the stops and places.Therefore, our set-based model focuses on the overall computational efficiency (e.g. the running time of all the shortest travel time between stops) rather than the computational efficiency of one origin-destination pair (e.g. the average running time of 1,000 requests).While there are numerous transfer trips due to the increase in the number of transfers, our results suggest that the number of transfer trips will significantly decrease when using the optimization algorithm.Therefore, the set-based model provides a new way for the overall calculation of LDTS, which can also help to conduct multiple simulation experiments (Sun et al. 2014) and support the optimization and planning of transportation systems.
Another important application of our study is to map fine-scale accessibility at the national and global levels.While Weiss et al. (2018) established the global 'friction surface' to calculate the shortest travel time at a grid-scale, they ignored the impact of LDTS.As a complement, our study provides all the travel time between stops that can be extended to door-to-door travel time.Compared to routing engines (e.g.R 5 ) that integrate all the transport modes (e.g.buses and subways) (Pereira et al. 2021), our study is more inclined towards a different solution, which is to calculate the intraurban and inter-urban travel time separately.This is because the transportation services for long-distance travel are different from those for short-distance travel.Meanwhile, our solution can also reduce the size of the input dataset and enable parallel processing, whilst the routing engines include the intra-urban transportation systems and road transportation (e.g. the OpenStreetMap).Given that this study calculated tourism accessibility at the town level, future studies will focus on grid-scale accessibility at national and global levels.That will be applicable in a broad range of disciplines, such as regional (Gong et al. 2021, Sun et al. 2021), transportation (Guo et al. 2021), health (Weiss et al. 2020), andtourism (Jin et al. 2020) science.
Although our study provides an efficient model for the shortest travel time in LDTS, there are a few limitations to this study.First, the generated process of the transfer index doesn't include the delay time of trains and flights.However, delays in LDTS are common phenomena.Meanwhile, travel time between stops and places can also be affected by urban congestion.Therefore, a promising avenue for future research is to improve the minimum transfer time with the travel time uncertainty (Chen et al. 2013(Chen et al. , 2017)).Second, our study focused on transfer trips within the same day, which could lead to overestimating the travel time when the number of transfers is greater than one.Nevertheless, multi-day transfers continue to pose a problem.This is because most models (e.g. the graph-based model and the set-based model) focused on a one-day timetable and assumed that the transfer time is sequential, i.e. the later trip cannot transfer to the earlier one.If it allows transfer on different days, the travelers could stay one night to select the shortest trip on the next day.This could need a two-day timetable.Besides, the multi-day transfers may also lead to excessive pruning of transfer links by the optimization algorithm, thus overestimating the travel time.Another task could be to deal with the multi-day transfers in the transfer link.Third, while we have removed the duplicated transfer links in three different types, there are still many more transfer links when the number of transfers is greater than two.Therefore, in the future, it will be necessary to identify additional types of duplicated transfer links to enhance optimization efficiency.

Conclusion
The development of LDTS greatly promotes inter-regional human activities and interactions, which are crucial for sustainable development and social equity (Kim andSultana 2015, Bosker et al.2018).However, the transfers in LDTS are not as convenient as those in intra-urban transportation systems which severely affects the calculation of travel time.While existing models provide the shortest travel time from one stop/ place, it is still a big challenge to calculate the travel time of millions of origin-destination pairs and evaluate the whole performance of LDTS.
This study proposed a set-based model to calculate the shortest travel time in LDTS with different numbers of transfers.The set-based model employed the Cartesian product to get the many-to-many travel time between stops of trips and the transfer links.Through the optimization algorithm, the set-based model can significantly reduce the number of transfer trips that need to be calculated.For instance, the number of transfer trips decreased by 96.17%, 98.15%, and 79.02%, for conventional railway, high-speed railway, and air transportation, respectively, when the number of transfers was two.
Compared to the graph-based model, i.e. the time-expanded model, the set-based model can be easily parallelized, since each trip was processed independently.Also, the set-based model can strictly adhere to the condition of a one-day timetable, whereas the time-expanded model could include partial results of a two-day timetable.
In addition, the set-based model can be extended to calculate the door-to-door travel time between places by integrating the travel time between the stops and the places that can support mapping the fine-scale accessibility at the national level.Future studies would continue to apply the set-based model to other GTFS data and improve the capability for multi-day transfers and multi-model trips.

Figure 1 .
Figure 1.The generation process of pairs of stops in a trip.

Figure 2 .
Figure 2. The generation process of the transfer trips.(a): the number of transfers equals one and (b): the number of transfers equals two.

Figure 3 Figure 3 .
Figure 3.The optimization of transfer links.Preserve the transfer links in red.

Figure 4 .
Figure 4.The set-based model for the shortest door-to-door travel time.

Figure 5 .
Figure 5.The impact of the optimization algorithm on the number of transfer links.

Figure 6 .
Figure 6.The running time of the parallel set-based model for the shortest travel time between all the stops as the CPU cores increased.

Figure 7 .
Figure 7.The comparison between the set-based model and the time-expanded model.The grey lines are y ¼ 1440 and x ¼ 1440.

Figure 8 .
Figure 8.The running time for direct trips.

Figure 9 .
Figure 9.The town-level tourism accessibility (a) and the spatial patterns of market size of scenery sites (b).( � IJGIS remains strictly neutral with respect to jurisdictional claims on disputed territories and the naming conventions used in the maps included in the figure.).

Table 1 .
The timetable of the circular transfer trip.