VIS-MM: a novel map-matching algorithm with semantic fusion from vehicle-borne images

Abstract Conventional map-matching (MM) algorithms take blind eyes to the complexity in realistic traffic conditions and hence present significant limitations in distinguishing the detailed driving paths of vehicles within complex urban road networks. The popularity of vehicle-borne cameras and advances in image recognition technologies provide an opportunity to remedy the gap through integrating vehicle-borne image semantic information with MM algorithms. Following this logic, this article proposes a novel MM algorithm with semantic fusion from vehicle-borne images (VIS-MM) suited to the parallel road scenes. First, a multipath output algorithm is developed using the hidden Markov model to obtain candidate paths. Second, image recognition techniques are employed to extract vehicle-borne image semantics. Finally, the entropy weight method is performed to determine the most promising driving path among the candidate paths. The experimental results show that semantic fusion from vehicle-borne images contributes to a significant improvement of accuracy from 66.18% to 99.88% against the parallel road scenes. The proposed map-matching algorithm can be applied into the fields of unmanned autonomous navigation and crowdsourcing updating of high-definition maps.


Introduction
The map-matching (MM) algorithm is an effective approach to trajectory data-position correction (Yang et al. 2011) and has been commonly used in online vehicle navigation and offline trajectory data mining (Rehrl et al. 2018). As one of the core algorithms for unmanned autonomous navigation and crowdsourcing updates of highdefinition maps, the performance of MM algorithm is of great significance (Schreiber et al. 2013, Kim et al. 2018, Asghar et al. 2020, Meng et al. 2020. Moreover, intelligent transportation, such as the intelligent supervision of traffic flows, also demand high accuracy of MM outcomes (Shen et al. 2016). In recent years, scholars have demonstrated a wide spectrum of methods to improve the accuracy of MM algorithms, such as conditional random fields (CRFs) (Liu et al. 2017, Xu et al. 2015, Kalman filters (Xu et al. 2010, Pink andHummel 2008), hidden Markov models (HMMs) (Cui et al. 2021, Hu and Lu 2019, Hsueh and Chen 2018, Song and Yan 2018, Zhang and He 2018, genetic algorithms (Singh et al. 2020, Nikoli c andJovi c 2017), ant colony algorithms (Gong et al. 2018) and deep learning algorithms (Liu et al. 2020, Feng et al. 2022. Among them, HMM-based MM (HMM-MM) algorithms have received much attention considering their strong ability to handle with the uncertainties in sampling frequency and GPS accuracy of trajectory data (He et al. 2019, Jiang et al. 2022, Zhang et al. 2021a.
Except the data quality of trajectory data, the accuracy of HMM-MM algorithms are heavily subjected to data quality of road networks (Li et al. 2021). In particular, the degree of generalization for the actual road networks sits at the heart of the challenge. For example, the driving direction of vehicles on a planar road may deviate significantly from the direction on a simplified linear representation of the road in the database (Figure 1(b)). Likewise, vehicles can make continuous direction changes at intersections without lane restrictions, but an intersection in urban road network data is usually abstracted as an intersection point, which can lead to a large direction deviation between vehicles and road network data (Figure 1(c)). The complexity of realistic traffic conditions is typically embodied in the parallel road scenes, featured by the main and auxiliary roads in the horizontal dimension and viaducts, overpasses and tunnels in the vertical dimension. Against the parallel road scenes, vehicle driving behaviours are quite complicated (Kong and Yang 2019) since the drivers possess more flexibility to choose and change driving paths. For example, vehicles can switch between main and auxiliary roads to quickly extricate themselves from congestion. However, the trajectory data cannot reflect such a complex driving process (Figure 1(a)). Due to the absence of auxiliary information about the actual driving behaviour and vehicle driving paths, the HMM-MM algorithms have no choice but to suppose that the vehicles should travel along the shortest or optimal paths (Wu et al. 2020, Zhang et al. 2021b, thereby introducing noteworthy errors and uncertainties.
In the era of intelligent transportation, vehicle-borne cameras have caught the eyes of scholars given their high penetration rates among urban residents and subsequent low costs for data collection in a crowdsourcing manner (Wang 2021(Wang , G€ ormer et al. 2009). With the aid of image recognition technologies, vehicle-borne images have been successfully applied in a variety of fields, including lane detection and lane keeping (Kuo et al. 2019), road sign detection and recognition (Du et al. 2019, Kuo andLin 2007), obstacle detection and recognition (Yang et al. 2008), vehicle detection and tracking (Aytekin and Altu g 2010), pedestrian and cyclist detection and recognition (Li et al. 2017), visual ranging and assisted positioning (Du et al. 2021, Min et al. 2019, simultaneous localization and mapping (Mur-Artal and Tardos 2017), 3D scene reconstruction (Christie et al. 2016), and point-of-interest creation (Liao et al. 2018). Considering that the vehicle-borne images can capture the actual driving behaviour and vehicle driving paths (Wen et al. 2022, Xiao et al. 2018, it should be a promising approach to integrate the vehicle-borne image semantic information with HMM-MM algorithms, so as to enhance the accuracy of HMM-MM against to the parallel road scenes. Unfortunately, no studies have been conducted toward such an endeavour. To address this research gap, this paper proposes a novel MM algorithm that fuses vehicle-borne image semantics (VIS-MM). To be specific, how to effectively extract the vehicle-borne image semantics and how to integrate vehicle-borne image semantic information using MM algorithms against the parallel road scenes are the key issues to be solved in this paper. The methodological innovations are achieved through three sequential steps ( Figure 2): 1. First, a multipath output algorithm based on the modified HMM-MM algorithm is proposed to obtain multiple similar candidate paths and ensure that the actual driving path is included in the candidate paths. 2. Second, several semantics representative of the vertical and horizontal parallel roads are extracted from vehicle-borne images, and corresponding quantification methods are proposed. 3. Third, an entropy weight method is used to select the most promising path from candidate paths, and the effectiveness of vehicle-borne image semantics in improving MM algorithm accuracy is verified by comparing the obtained path with the actual driving path and the result of a conventional HMM-MM algorithm.

Definitions
A schematic of the road elements is shown in Figure 3(a).
Definition 1. Road: A road is defined as a sequence of road segments.
Definition 2. Road segment: A road segment is made up of one or a sequence of segments as denoted by R ¼ ½r 1 . . . r a : Road segments connect end to end, and there is no case in which only two road segments are connected through a node.
Definition 3. Node: Nodes are the connection points between road segments, as denoted by N ¼ ½n 1 . . . n b : Nodes and road segments are mainly used in the shortest path algorithm.
Definition 4. Geometry point: Geometry points are the turning points on polyline road segments that represent the geometric shape of the road segments. Definition 5. Segment: A segment composed of geometry points or nodes. Segments are mainly used for MM of trajectory points. A road segment without geometry points is also called a segment when MM.
A schematic of the trajectory data and the MM process is shown in Figure 3(b).
Definition 6. Trajectory point: A trajectory represents continuous GPS points, where each GPS point is defined as a trajectory point, which needs to be matched to the segments. The ith trajectory point is labelled p i : Definition 7. Candidate segment: Candidate segments are the segments within a certain range of a trajectory point, labelled s j i , which means the jth candidate segments of the trajectory point p i : As shown in Figure 3(b), the candidate segments of p i are s 1 i , s 2 i and s 3 i : Definition 8. Candidate point: Candidate point c j i is the projection point for trajectory point p i on candidate segment s j i : If the position of the projection point is not on the candidate segment, the endpoint of the candidate segment closest to the trajectory point is taken as the candidate point. As shown in Figure 3(b), c 1 i , c 2 i and c 3 i are the candidate points of trajectory point p i on candidate segments s 1 i , s 2 i and s 3 i , respectively.
Definition 9. Candidate path: Candidate paths are the possible paths that vehicles travel. In Figure 3(b), there are two candidate paths generated by the trajectory points.

VIS-MM algorithm
3.1. Multipath output algorithm 3.1.1. Core idea The Viterbi algorithm is usually used to optimize the computational complexity of HMM-MM algorithms (Xie et al. 2020, Hu and Lu 2019, Qi et al. 2019, Che et al. 2018, Song and Yan 2018, Hsueh and Chen 2018, Hsueh et al. 2017, Luo et al. 2017, Yang and Gid ofalvi 2018. In the Viterbi algorithm, only one candidate point pair with the highest probability can be recorded among all the candidate point pairs formed by the current and the previous trajectory points, as shown in Figure 4(a). Therefore, if two candidate paths converge at the same candidate point, only one can be retained by an HMM-MM algorithm. The multipath output algorithm ensures that the candidate paths through different road segments can be preserved. Therefore, the core idea of the multipath output algorithm is to take the paths through different road segments as the state nodes of the Viterbi algorithm instead of taking the different candidate points of the current trajectory point as the state nodes. That is, only one candidate path with the highest probability among all the candidate paths that pass through the same road segments can be retained, as shown in Figure 4(b). Finally, the candidate paths can be obtained by all the state nodes of the last trajectory point in the Viterbi algorithm. Therefore, compared with HMM-MM algorithm, which outputs only one optimal path, the multipath output algorithm is more complicated, and many cases need further consideration.

Key steps
The key steps are shown in Figure 5. The main differences between the multipath output algorithm and a conventional HMM-MM algorithm are the candidate point grouping and screening, the logic of the Viterbi algorithm, the candidate path screening and splicing, and the modification of the shortest path algorithm. The detailed process of the multipath output algorithm is as follows: 3.1.2.1. Structuring and caching of road elements. At the beginning of the multipath output algorithm, the road elements should be structured and cached to improve the search efficiency. First, an Rþ-tree (Sellis 1987) is used to construct a spatial index for segments to meet the spatial query requirements of candidate segments within a certain range of trajectory points. Then, a red-black tree is used to construct an attribute index for road segments, nodes and segments. The specific information of these road elements can be obtained from the attribute index by their IDs or the IDs of other road elements associated with them. Finally, the road segments and nodes are used to construct a weighted directed graph for A Ã shortest path algorithm. All cached data only need to be initialized once before the algorithm is executed. 3.1.2.2. Candidate point acquisition and preliminary screening. To obtain the candidate points, the candidate segments within a certain range r d of the current trajectory point are first selected through the spatial index. Then, the candidate segments whose angle difference between the driving direction and the direction of the candidate segment is greater than a certain value a t are discarded. Finally, a single candidate point on each segment is identified.
A point-line relation function (denoted by r, see Formula (1)) can be used to determine the position of the candidate point on the corresponding candidate segment (Li et al. 2021). A point-line relation function schematic is shown in Figure 6 (the dotted line is the dividing line for the value of r).
If the projection point of trajectory point p is on the candidate segments (p is in region II), then the value of r is between 0 and 1 (0 r 1), and the projection point is taken as the candidate point. The coordinates of the candidate point can be obtained using the following equations: When p is in region I (r 0) or region III (r ! 1), the nearest endpoint A or B is selected as the candidate point, respectively. These candidate points are nodes or geometry points that play an important role in the MM algorithm and therefore cannot be ignored.
3.1.2.3. Candidate point grouping and rescreening. To screen out the key candidate points from each segment and reduce the duplication candidate paths, the multipath output algorithm groups the candidate points according to the value of r and the MM distance, and then screens the candidate points according to the associated road segments. First, the candidate points whose r 2 ½0, 1 are added to the candidate set; the candidate points whose r 2 ½Àe, 0Þ [ ð1, 1 þ e or the distance from their corresponding trajectory points is less than a certain threshold d t are temporarily added to the alternative set; and other candidate points are ignored. Then, some key candidate points in the alternative set whose associated road segments do not appear in the candidate set are moved to the candidate set. When the projection point and geometry point are taken as candidate points, the road segment, where the projection point or geometry point is located, is treated as the associated road segment. However, when the candidate points are nodes, all road segments associated with the node are considered associated road segments. Figure 7 shows that the associated road segment of p 1 is r 1 , and the associated road segments of p 2 are r 1 , r 2 , r 3 , r 4 : If there is any identical associated road segment between two candidate points, only the candidate point closest to the trajectory point is retained in the candidate set.
3.1.2.4. Viterbi algorithm. In the Viterbi algorithm, the transition probability and observation probability adopted by the multipath output algorithm are the same as those of other HMM-MM algorithms (Hsueh and Chen 2018, Hsueh et al. 2017, Lou et al. 2009). The main differences lie in the recognition method of state nodes and the probability calculation process of candidate point pairs in different groups. A schematic of the Viterbi algorithm is shown in Figure 8, and a detailed flow chart can be found in the Supplementary Appendix.
In the first execution process of the Viterbi algorithm, only the observation probability of all the current candidate points in the candidate set and alternative set are calculated. Then, the candidate points and their probabilities are packaged as state nodes and added into the candidate result set and alternative result set respectively for the probability calculation of the subsequent trajectory point. If the recorded candidate result set and alternative result set of the previous trajectory point exist, the Viterbi algorithm process is subdivided into the following 4 steps in the multipath output algorithm: a. The transition probability of the candidate point pairs between the candidate set of the current trajectory point and the candidate result set of the previous trajectory point is calculated. The candidate point pairs with no shortest paths or poor connectivity (whose shortest path distance is greater than d s m and is t 1 times greater than the linear length of their corresponding trajectory points) are discarded. Then, the overall probability and the passing road segment ID chain between the candidate point pairs are calculated. The passing road segment ID chain is obtained using the A Ã shortest path algorithm and is taken as a unique identifier to distinguish similar candidate paths. Finally, each candidate point pair is packaged as a state node with current candidate point, previous candidate point (state node), overall probability and identifier, and all the state nodes are added into the candidate result set if the ratio of the shortest path distance and the linear length does not exceed a certain threshold t 2 : Other state nodes are added into the temporary result set. When the state nodes with the same identifier are added to the result set, only the node with the highest probability is retained. b. Similar to the previous step, the transition probability of candidate point pairs between the alternative set of the current trajectory point and the candidate result set of the previous trajectory point is calculated. Then, the candidate point pairs without shortest paths or poor connectivity are discarded, the overall probability and the identifier of the remaining candidate points are calculated, and the candidate point pairs together with the relevant information are packaged as state nodes. The state nodes that meet the distance condition are added to the alternative result set. Others are added to the temporary result set. The difference is that when the state nodes with the same identifier are added to the result set, only the node with the shortest MM distance between the current candidate point and the trajectory point is retained. c. Check whether the associated road segments of all candidate points in the candidate set of the current trajectory point also appear in the current candidate result set. For the unrecorded associated road segments, the candidate points associated with these road segments are chosen from the candidate set, the transition probability of the candidate point pairs between these candidate points and the alternative result set of the previous trajectory point is calculated, the candidate point pairs without shortest paths or poor connectivity are discarded, and the remaining candidate point pairs together with the calculated relevant information are packaged as state nodes. Then, according to the principle of highest probability, the state nodes are added to the candidate result set. d. Check whether the associated road segments of all candidate points in the candidate result set of the previous trajectory point also appear in the current candidate result set. For the unrecorded associated road segments, the state nodes in the alternative result set whose candidate points are associated with these road segments are moved to the candidate result set according to the principle of shortest MM distance.
After the above steps, if the candidate result set and alternative result set are still empty, then all the state nodes in the temporary result set are moved to the candidate result set. Finally, the candidate result set and the alternative result set, as the results of the Viterbi algorithm at the current trajectory point, are recorded, and the next trajectory point is traversed. However, if the current result sets are all empty, it is proven that the current trajectory point is not connected to the solved candidate paths. Then, the current solved candidate paths are screened, solved and spliced, and the multipath output algorithm is restarted with the current trajectory point as the first trajectory point.
3.1.2.5. Candidate path screening. To avoid candidate paths being similar, duplicate or non-optimal due to the existence of multiple non-projected candidate points in the first or last trajectory point, further screening is needed before saving candidate paths. The screening rule is as follows: Take the passing road segment ID chain from the second candidate point to the penultimate candidate point as the unique identifier; for candidate paths with the same unique identifier, only the one with the highest probability is retained.
3.1.2.6. Recursive solution and result splicing. A detailed flow chart of the candidate path recursive solution and splicing steps is shown in Figure 9. In the multipath output algorithm, each state node of the last trajectory point represents a candidate path, and a candidate path is obtained by constantly recurring the parent state node of the current state node. Then, the obtained candidate paths are added to the result set. However, when the Viterbi algorithm has been interrupted due to the lack of connectivity between the adjacent trajectory points, the candidate paths belonging to different parts before and after the interruption need to be spliced.
If the head and tail candidate points of two candidate paths to be spliced are on the same road segment but their relative positions are opposite to the driving direction (as shown in Figure 10(a)), the candidate paths are spliced according to the road segment ID. Otherwise, candidate paths are spliced according to the straight-line distance between the head and tail candidate points (as shown in Figure 10(b)). Each candidate path after interruption is matched with a more likely candidate path before the interruption. After splicing, the overall probability of the new candidate path is calculated according to the overall probability of the candidate paths before and after the interruption, and only new candidate paths are kept in the candidate path result set.
3.1.2.7. Other considerations in the A Ã shortest path algorithm. The multipath output algorithm considers the travel time between adjacent trajectory points in the shortest path algorithm to control the traversal range of nodes, which greatly improves the efficiency of the shortest path algorithm. In addition, to prevent the solved candidate paths from U-turning on a two-way road segment (as shown in Figure 11), the multipath output algorithm assumes that each candidate path has only one specific driving direction on each two-way road segment, and the directions recorded in the previous state node are considered in the shortest path algorithm of the current candidate point pair connectivity calculation. After the shortest path is solved, the driving directions of all twoway road segments in the candidate path are recorded in the new state node.

Cases and performances of the algorithm
The multipath output algorithm parameters are set as follows: In the candidate point screening, r d is set to 150 m according to the lowest positioning accuracy of the trajectory point; the threshold a t is set to 60 and the distance threshold d t is set to 40 m according to the sampling statistics of the trajectory data to ensure that all reasonable candidate points and candidate segments can be retained. In the Viterbi algorithm, the thresholds d s , t 1 and t 2 are empirically set to 50 m, 10 and 5, respectively, to ensure the connectivity of the solved candidate paths.
The multipath output algorithm is verified using annotated and segmented taxi trajectory data, and 30 pieces of data were randomly selected for each parallel road scene, including main and auxiliary roads, viaducts, overpasses, tunnels and nontunnel roads, and other parallel roads of all grades within 300 metres in the same direction, with a total of 150 trajectories selected. The trajectories cover various possible position jitters, either driving on a specific road or switching between different parallel roads. The recall rate (the probability that the actual vehicle driving paths are included in the multipath output results) is 100%. The solution time for each trajectory point varies from 50 to 200 milliseconds, and the efficiency gradually decreases with increasing intersection nodes and trajectory points.
Partial experimental results of the multipath output algorithm are shown in Figure 12. The following can be found from the experimental results: 1. The candidate points grouping strategy and the unique candidate path identification method make the candidate paths not repeated in the scene with key trajectory points. As shown in Figure 12(a), point 5 is the key trajectory point, candidate paths 1-1 and 1-2 are generated by trajectory 1, and candidate path 1-3 is excluded since it is consistent with candidate path 1-1. The difference is that the key trajectory point 5 is matched to road segment s1 in candidate path 1-1 and the intersection point of road segments s1, s2 and s3 in candidate path 1-3; 2. The connectivity consideration allows the possible candidate paths to be retained even when the trajectory has an overall offset. As shown in Figure 12(b), candidate paths 2-1 and 2-2 are generated by trajectory 2, and candidate path 2-1 can be retained since points 10 and 11 can both be matched to the intersection point of road segments s1, s2 and s3. 3. The direction constraint on the two-way road excludes candidate paths that U-turn on the two-way road. As shown in Figure 12(c), candidate paths 3-1, 3-2, 3-3 and 3-4 are generated by trajectory 3, and candidate paths 3-5 and 3-6 are the excluded impossible cases. 4. The candidate path splicing strategy allows all possible candidate paths to be retained. As shown in Figure 12(d), candidate paths 4-1, 4-2 and 4-3 are generated by trajectory 4. Since the relative positions of points 6 and 7 are opposite to the driving direction, the candidate paths are interrupted here, and the number of candidate paths before and after the interruption is inconsistent due to Figure 11. Candidate path turning on a two-way road segment.
the offset of the trajectory points and the limitation of the search radius.
According to the splicing strategy, candidate paths 4-1 and 4-2 are spliced based on the same road segment ID of the candidate point pairs, while candidate path 4-3 is spliced based on the shortest straight-line distance between the candidate point pairs.

Extraction of semantic information from vehicle-borne images
To effectively recognize parallel roads both in horizontal and vertical dimensions, three vehicle-borne image semantics, namely the road scenes, lane numbers and diversion lines, are taken into consideration. Then, vehicle-borne image are classified for semantic extraction, and the sample data are manually annotated for the three semantics. Finally, convolutional neural networks are established and trained for each sematic type. The accuracy of the convolutional neural network is labelled a 2 ½0, 1, which can be used for the quantification of vehicle-borne image semantics. The definition and quantification methods of the three vehicle-borne image semantics in details are as follows: 3.2.1. Road scene A road scene is used to distinguish vertical parallel roads, which can be divided into tunnel scenes, under viaduct (the viaduct is parallel or perpendicular to the driving direction) scenes and no-shelter scenes according to the vehicle-borne image semantics. If a viaduct appears in a vehicle-borne image, it is inferred that the vehicle is driving under the viaduct. Scenes without a viaduct or on a viaduct are called no-shelter scenes and are not distinguished in this study. The road scene semantic requires that tunnels and viaducts are labelled in the urban road network data, and the quantification method for the road scene semantic is as follows: S scene ¼ a road scene is consistent with the road label 0 road scene is inconsistent with the road label & Road scenes can be considered consistent with the road data in the following cases: a. when the vehicle-borne image is classified as a tunnel and the road is labelled as a tunnel. b. when the vehicle-borne image is classified as an under viaduct (the viaduct is parallel to the driving direction), and the road is not labelled as a tunnel or viaduct. c. when the vehicle-borne image is classified as an under viaduct (the viaduct is perpendicular to the driving direction) or a no-shelter scene, if there is no viaduct among all candidate segments of the current trajectory point, the nontunnel roads (all roads not labelled as tunnels in the urban road network data) are considered to be consistent with the current road scene; otherwise, the under viaduct road scene (the viaduct is perpendicular to the driving direction) is consistent with the nontunnel roads, and the no-shelter scene is consistent with the viaduct roads.

Lane number
The lane number is used to identify the main and auxiliary roads. In a vehicle-borne image, only the lanes consistent with the driving direction of the vehicle are identified; that is, the lanes to the left of the yellow solid or dotted lines as well as the nonmotorized lanes on the right side of the road are ignored. The default minimum lane number is 1 for an intersection scene or a rural road with no lane lines. The default maximum lane number is 4 since vehicle-borne images have limited ability to recognize the lane number, mainly due to the uncertain position of the vehicles on the road, the limited observation range and the unfixed orientation of vehicle-borne cameras. Then, the lane number of road network data and the annotation data of vehicleborne image samples used for training are artificially corrected when the lane number is greater than 4 or less than 1. The quantification method for the lane number semantic is as follows: where y p , y r are the corrected lane numbers of the recognition results of the vehicleborne image and the road network data, respectively. abs is the absolute value function, and max is the maximum value function.

Diversion line
A diversion line represents bifurcated or merged roads, which can be used to identify the relative position of vehicles in a set of intersecting candidate paths. The diversion line semantic refers to the positions where the diversion lines are located, including when there are no diversion lines, the diversion lines are on the left, the diversion lines are on the right and the diversion lines are on both sides. If there are no diversion lines, all candidate paths at the current trajectory point have a score of a: Otherwise, a certain number of candidate points are selected from all candidate paths with the current trajectory point as the centre to form path chains. Then, the path chains are grouped according to the road segment ID set of each path chain using a union-find set data structure. For a group with only one path chain, the corresponding candidate path is scored as 0:8a (as shown in Figure 13(b)); for a group with multiple path chains, if the diversion line appears on the left, the vehicle is driving on a relative right-side road, and the candidate path corresponding to the right-most path chain in the group is scored as a: The scores of other candidate paths are successively decreased by 0:2a to 0 from right to left, and the opposite process is used when the diversion line appears on the right (as Figure 13. The quantification method for the diversion line semantic. shown in Figure 13(b)); however, if a diversion line appears on both sides, the vehicle is driving in the middle road, the candidate path corresponding to the middle path chain in the group is scored as a, and the scores of other candidate paths on both sides are decreased by 0:2a to 0 successively from the middle (as shown in Figure 13(a)).
The quantification method for the diversion line semantic at a trajectory point is as follows: where c j represents the number of path chains in the jth group and i represents the order of the path chain in its corresponding group. For the relative position discrimination of the two path chains, this study uses the driving direction of each trajectory point as the slope and determines the relative position of the two path chains at each trajectory point by evaluating the intercept of the line passing the trajectory point on the coordinates. If the driving direction is between ð0 , 180 , the path chain whose candidate point has the highest intercept value on the Y-axis is relative to the left, while the intercept value on the X-axis is considered when the driving direction is 180 . The opposite is true when the driving direction is between ð180 , 360 : In a set of path chains, if the position of one path chain relative to another path chain appears to be both to the left and right, the most frequent relative position of all trajectory points is taken as the final relative position of the path chain.

The entropy weight method
The entropy weight method is an objective assignment method for calculating the weight using the dispersion of indicators, which is more suitable for scenes where all types of indicators have similar and unstable contributions to the evaluation content, Figure 14. Flow chart of the entropy weight method and candidate path evaluation. and the weight of each indicator in a specific dataset can be determined by the internal rules of each indicator (Nie and Yu 2011).
This study takes the overall spatial characteristics of candidate paths and vehicleborne image semantics as evaluation indicators for the candidate paths; see Section 4.2 for details. Considering the accuracy of the image recognition algorithm and the random GPS jitter of the trajectory data, it is impossible to determine which type of indicator is more stable. It can be considered that each type of indicator has similar and unstable contributions to the candidate path evaluation. Therefore, an entropy weight method is used to calculate the weight of each indicator and evaluate the candidate paths. Then, the final map-matched result can be obtained from the multiple candidate paths according to the maximum weighted sum of all normalized indicators. A flow chart of the entropy weight method and candidate path evaluation is shown in Figure 14, and the detailed formula of the entropy weight method can be found in other related works (Zou et al. 2006).

Data
This study takes Wuhan, the capital city of Hubei Province in central China, as the research area. The road network data are provided by a local navigation data company, which covers the whole city of Wuhan and includes 272,150 nodes, 351,733 road segments and 1,287,552 segments. To explore the effectiveness of the vehicleborne image semantics in improving the accuracy of an HMM-MM algorithm, 62 km of trajectory data with vehicle-borne images and attribute information, such as the timestamp, latitude, longitude, driving direction and instantaneous speed, are collected through independently developed data acquisition software installed in a smartphone (Huawei Honor 8, which is used like a dashcam). Finally, 20 groups of trajectory data located in different parallel road scenes (average GPS accuracy is approximately 15 m, with a minimum accuracy of 50 m) are extracted, which are error-prone in conventional HMM-MM algorithms. Detailed information is shown in Table 1. The collection route and experimental data extraction results of the collected trajectory data are shown in Figure 15(a). Vehicle-borne image semantics extraction requires many samples for model training. Since Baidu maps (https://map.baidu.com/) panoramas can obtain road scene pictures in the driving direction through panoramic slice splicing, they can be used as training samples for vehicle-borne image semantic extraction. A total of 160,000 panoramic data are collected and annotated in this study, and the scope of the data is shown in Figure 13(b). To balance the sample size for different types of vehicle-borne image semantics, the samples used for training and validation are randomly extracted from the panorama dataset (the ratio is approximately 4:1), and the sample size of the training set is as consistent as possible under the same semantic type. The sample sizes of each semantic type are shown in Table 2.

Experiment and result
The accuracies of a VGG deep convolutional neural network and ResNet residual deep neural network are compared in the vehicle-borne image semantic extraction. Figure 15. Study area and data distribution (numbers in parentheses represent the road scene, 1: main and auxiliary roads, 2: enter viaduct, 3: exit viaduct, 4: on viaduct, 5: under viaduct, 6: enter tunnel, 7: exit tunnel, 8: in tunnel, 9: overpass, 10: parallel roads, 11: did not enter viaduct, and 12: did not enter tunnel).
The experimental results show that the VGG has higher accuracy than ResNet, and the lane number recognition, road scene and diversion line classification accuracy is above 80% (ResNet is approximately 70%). Therefore, this study uses a VGG deep convolutional neural network to extract vehicle image semantics. According to the statistics, the total extraction time of all semantics in each vehicle-borne image is approximately 0.4 s.
In this study, 4 groups of comparative experiments are designed, including a conventional HMM-MM algorithm, a VIS-MM algorithm using only vehicle-borne image  semantics in the entropy weight method, a VIS-MM algorithm using only the overall spatial characteristics of the candidate path in the entropy weight method, and a VIS-MM algorithm using both the two types of indicators in the entropy weight method. Then, the effectiveness of the trajectory overall spatial characteristics and the vehicleborne image semantics in distinguishing parallel candidate paths are verified, and the accuracy and efficiency between the proposed VIS-MM algorithm and a conventional HMM-MM algorithm are compared. The overall spatial characteristics of the candidate paths used in this study include the mean and variance of the distance between each trajectory point and its candidate point in each candidate path, and the mean and variance of the angle difference between the driving direction of each trajectory point and the direction of its candidate segment in each candidate path. In the evaluation of candidate paths, the quantified vehicle-borne image semantics are taken as positive indicators, the overall spatial characteristics are taken as negative indicators. Then, the overall score of each candidate path can be obtained as the weighted sum of the normalized indicators and their weight. The experimental results are shown in Table 3. Since the entropy weight method takes less time, the efficiency of the multipath output algorithm is considered the efficiency of all VIS-MM algorithms. The accuracy is calculated according to the following formula: where N c represents the number of candidate points matched to the correct candidate segment and N a represents the number of all candidate points of the current candidate path. Figures 16-20 show the comparative experimental results in different scenes. Through the comparison experiments, the actual driving paths can be obtained by a VIS-MM algorithm using only the vehicle-borne image semantics in the entropy weight method in most cases. Its accuracy is 99.88%, higher than that of the HMM-MM algorithm with an accuracy of 66.18%. Therefore, vehicle-borne image semantics can effectively improve the accuracy and stability of MM algorithms. However, there are still some MM errors, as shown in Figure 20(c). The main reason is abnormal lane number recognition caused by vehicle occlusion, and there are few candidate points on the candidate segment, so there are no subsequent trajectory points with correct lane number recognition results to correct the trajectory points with abnormal lane number recognition results.
The MM results obtained by the VIS-MM algorithm using only the overall spatial characteristics of candidate paths in the entropy weight method are generally consistent with those obtained by the HMM-MM algorithm, and the MM results are easily affected by GPS jitter in parallel road scenes. The actual driving paths can be obtained when vehicles travel the shortest or optimal path or when the trajectory data have high positioning accuracy and the trajectory points are closer to the roads where the vehicle travels in all parallel roads.
The MM results obtained by the VIS-MM algorithm using both the vehicle-borne image semantics and the overall spatial characteristics of candidate paths in the entropy weight method may also be affected by GPS jitter in parallel road scenes, and the MM results are usually between those of the VIM-MM algorithm considering only the vehicle-borne image semantics and those of the VIM-MM algorithm considering only the overall spatial characteristics, as shown in Figure 18(d). However, some wrong MM trajectory points caused by the abnormal recognition of the vehicle-borne image semantics can be corrected by the VIS-MM algorithm considering the vehicle-borne image semantics and part of the overall spatial characteristics of candidate paths. For example, Data 20 shown in Figure 20 can obtain the correct MM results when the VIS-MM algorithm uses vehicleborne image semantics and the mean and variance of the angle difference in the entropy weight method. However, the application scenarios of each overall spatial characteristic of the candidate path are still unstable and need to be further studied.
In terms of efficiency, the VIS-MM algorithm is lower than the HMM-MM algorithm, mainly reflected in the multipath output algorithm, which has higher complexity than the HMM-MM algorithm. With the increase in the switching paths between the parallel roads, the more candidate paths there are, the lower the efficiency. The data sampling frequency determines the granularity and efficiency of MM algorithms but does not affect the matching accuracy of each trajectory point in the conventional MM algorithms and the proposed VIS-MM algorithm. For example, when the sampling frequency is low, the VIS-MM algorithm can accurately infer the location of each trajectory point, but the actual driving path between two candidate points still needs to rely on the shortest or optimal path. Road network data are the reference for the location correction of trajectory points in an MM algorithm, which determine the accuracy of the MM algorithm to a certain extent. If part of the road network data is missing, the Viterbi algorithm in the VIS-MM algorithm or HMM-MM algorithm may be interrupted, and some trajectory points may be incorrectly map-matched.

Discussion and conclusion
This paper proposes a novel MM algorithm with semantic fusion from vehicle-borne images suited to the parallel road scenes. First, a multipath output algorithm is developed based on an HMM-MM algorithm to obtain candidate paths. Second, image recognition techniques are employed to extract vehicle-borne image semantics. Finally, the entropy weight method is performed to determine the most promising driving path among the candidate paths. The experimental results show that semantic fusion from vehicle-borne images contributes to a significant improvement of accuracy from 66.18% to 99.88% against the parallel road scenes. The proposed VIS-MM algorithm not only provides a new solution to addressing the accuracy challenges faced by the MM algorithms, but also provides a practical revenue for widening the applicability of vehicle-borne images. It can be applied into the fields of unmanned autonomous navigation and crowdsourcing updating of high-definition maps.
Although we are delighted to achieve methodological innovations, some limitations in practical applications should be further acknowledged as follows: 1. Low efficiency. The VIS-MM algorithm is less efficient than HMM-MM algorithms for two main reasons. For one thing, for the multipath output algorithm, to ensure that all possible candidate paths can be preserved, the state nodes in the Viterbi algorithm cannot be ruled out in accordance with the overall probability; hence, the number of state nodes increases with the number of switching paths between the parallel roads, resulting in the high complexity and low efficiency of the multipath output algorithm. For another, the image recognition model used in this study is a universal model that is not designed for a certain kind of vehicle-borne image semantics, reducing the accuracy and efficiency of vehicle-borne image semantic recognition. 2. Limited recognition capability of vehicle-borne image semantics. The VIS-MM algorithm considering vehicle-borne image semantics still has incorrect MM trajectory points, mainly in the following aspects. First, the recognition of vehicle-borne image semantics are easily affected by the occlusion of other vehicles. Second, a vehicle-borne image presents the scene in front of the current vehicle, which results in potential deviations from the actual driving path and may further introduce error into the trajectory points MM process. Third, due to the abstraction differences between road network data and actual roads, the usage of vehicleborne image semantics may not work in particular circumstances. For example, vehicle-borne images identify highway service areas as road segments with multiple lanes, but in road network data, each lane is usually extracted as a separate road segment, as shown in Figure 21. 3. Unused overall spatial characteristics of candidate paths. The overall spatial characteristics of candidate paths can be used to correct the MM errors of partial trajectory points caused by abnormal recognition of vehicle-borne image semantics. However, in most cases, considering the overall spatial characteristics of the path will make the MM results close to the trajectory data, which is not suitable for the situation in which the trajectory points have larger GPS jitter. Therefore, the overall spatial characteristics of the candidate paths are still not integrated into an MM algorithm under reasonable scenarios.
To further improve the accuracy and efficiency of the VIS-MM algorithm, future work can be carried out in two areas. Regarding the vehicle-borne images, more appropriate image recognition models can be designed for different types of vehicleborne image semantics, and other vehicle-borne image semantics that can be used to identify vertical and horizontal parallel roads can be supplemented. As for the MM algorithms, different algorithm strategies for different road scenes can be further adopted and performance can be further examined against different road scenes. In particular, how the proposed VIS-MM algorithm performs in cities with high threedimensional development should be understood.

Notes on contributors
Bozhao Li is currently a postdoctoral researcher with the School of Resources and Environment Science, Wuhan University. His research interests mainly concentrate on transportation big geodata and artificial intelligence. His contribution to this paper: conceptualization, methodology, algorithm design and implementation, experimental verification, manuscript writing and funding acquisition.
Mengqi Wang is currently working towards a Ph.D. degree at the School of Resource and Environmental Sciences, Wuhan University. Her research interests include spatial analysis and geospatial knowledge graphs. Her contribution to this paper: experimental data acquisition, data preprocessing and experimental verification.
Zhongliang Cai is a Full Professor of Cartography and GIScience at the School of Resources and Environment Science, Wuhan University. His research interests include GIS engineering design and development, automatic mapping theory and methods, mobile GIS and location services. His contribution to this paper: conceptualization, methodology and funding acquisition.
Shiliang Su is a Full Professor of Cartography and GIScience at the School of Resources and Environment Science, Wuhan University. His research interests include spatial data analysis and cartographical visualization. His contribution to this paper: conceptualization, methodology, manuscript review and editing and funding acquisition.
Mengjun Kang is an Associate Professor of Cartography and GIScience at the School of Resources and Environment Science, Wuhan University. His research interests include natural language spatial relation processing, spatial information visualization, urban computing and geographic information service integration. His contribution to this paper: methodology and algorithm effectiveness analysis. Data and code availability statement

ORCID
The data and codes that support the findings of this study are available in "figshare.com" with the identifier at https://doi.org/10.6084/m9.figshare.20445138.