Road edge detection based on combined deep learning and spatial statistics of LiDAR data

ABSTRACT Mobile laser scanning data can be used for effective extraction of road edge information, which is important in the domain of road maintenance and intelligent transportation. This paper proposes a road edge detection method that combines a deep learning and spatial statistics of point cloud data. Semantic segmentation using a deep neural network enables the effective extraction of point cloud fragments recognized as road. The process continues with the spatial statistical analysis of voxel features of data organized into a 3D voxel grid. Filtered voxels are clustered into spatially proximate clusters of similar shape, i.e. straight or curved edges.


Introduction
Knowing the road edge location plays an important role in smart city-oriented information systems. Reliable, accurate and quick road edge detection techniques allow rapid improvement of many applications in domains of road infrastructure planning and maintenance, intelligent transportation and autonomous driving (Soilán et al. 2018, Che et al. 2019, Sun et al. 2019. Extraction of road edges is a challenging problem because road curbs may have various shapes and structures. Selection of a 3D LiDAR sensor for collecting input data and forming point cloud representation has provided better scene understanding and has found many applications (Guo et al. 2020). LiDAR sensors are becoming an attractive data source due to their increasing affordability, independence of light conditions and accurate distance measurements.
Depending on the type of sensor that is used, road boundary detection methods can be divided into camera-based, LiDAR-based and combined methods. Most camera-based methods address the problem using stereo cameras and 3D geometry to identify road edges (Wang et al. 2016). However, unlike camera-based methods, LiDAR-based approaches are not restricted by lighting or weather conditions. The combined approach provides the most complete description of the scene (Jeong et al. 2019).
Regarding the method of point cloud data processing, two basic approaches are noticeable: conventional and deep learning-based. Conventional approaches predominantly use local geometry and statistics, as well as voxelization and curve estimation methods.
In order to recognize road markings used in a mobile mapping system, Guan et al. (2015) proposed curb extraction based on evaluation of both slope and elevation difference between consecutive points, while Wang et al. (2019) introduced the use of reflectance characteristics besides geometric features of road boundaries. Sun et al. (2019) proposed combined geometrical features and road boundary fitting using a cubic spline model.
Road curb detection can be observed as extraction of edges among ground points organized in voxels. Xu et al. (2017) provide a robust solution that extracts the candidate points of curbs using the novel energy function and refines the candidate points using the proposed least-cost path model. In order to recognize road edges of various shapes and heights, a RANSAC filter is proposed for road curb candidate point filtering (Wang et al. 2018, Hu et al. 2018, while fitting the road curb's curvature estimation is performed using a Kalman filter (Wang et al. 2018).
Deep learning-based approaches have shown impressive effectiveness in semantic image segmentation (Hu et al. 2020) and potential for classification of points in cloud data (Sakr et al. 2019). A common characteristic of deep learning-based approaches for road curb detection is transformation of the input 3D point cloud data, either through a transformation into 2D-view images or a voxelization. Suleymanov et al. (2019) have processed 2D bird's-eye view images projected from 3D point cloud data using two trained deep neural networks, where the first network detects visible road edges, while the second one infers both visible and occluded road boundaries. Sakr et al. (2019) described a promising approach that transforms 3D voxelated point clouds into hyperimages and employs a convolutional neural network for detection and classification of road edges and lane markings.
Well-generalized deep learning models for road edge detection require a large training dataset and ground truth labels. Annotation of curbs and similar fine-grained road details can be extremely time-consuming due to the need for human interaction. In order to save human resources for manual annotation of road edges, this paper proposes a procedure that uses a deep neural network for semantic segmentation of cloud points trained on scenes of an urban environment, as a first step. Its outputs, in the form of recognized roads and road marking classes, serve as reliable inputs for the following phases of the process in which road curbs are extracted and segmented from the remaining points of the point cloud.
The main contributions of our work are presented in this paper as follows: (1) the proposed road edge detection procedure is a combination of a deep learningbased model and the point cloud data processing using spatial statistics algorithms; (2) point cloud segments recognized by the pre-trained RandLA-Net model as road or road markings are voxelated and further processed through analysis of three feature subsets associated with changes in elevation (intra-voxel features), local similarity between voxels (inter-voxel features) and voxel surface slope; (3) creating the final curb segments using merged agglomerative clustering and polyline fitting.
The rest of the paper is organized as follows: Section 2 gives a detailed description of the proposed procedure for road curb detection. The results and discussion are presented in Section 3. The last section concludes the paper.

Road edge detection procedure description
The proposed road edge detection procedure schematically presented in Figure 1 starts with semantic segmentation of input 3D point cloud data performed by a deep neural network classifier. Point cloud segments that are recognized by the adopted RandLA-Net classifier as road or road marking classes are afterwards organized in volumetric pixels (voxels). Points that belong to the curb region are often characterized by a sudden change in the value along the z-axis, and this change can vary depending on the height and slope of the curb (Zhao and Yuan 2012). In addition, it was considered that the surface normal, which defines the curb, must enclose a significant angle with respect to the z-axis. Another considered assumption is that the geometry of points in a voxel belonging to a curb must differ significantly from the characteristics of neighbouring voxels belonging to surfaces such as road or sidewalk. These presumed properties of voxels that may belong to the curb have served as the basis for calculation of three groups of features: (1) intra-voxel characteristics; (2) the estimated plane normal formed by the points within the voxel; and (3) inter-voxel characteristics. These characteristics present the basis for creation of three sets of curb-voxel candidates. The voxel set created using intra-voxel characteristics is filtered using k-means clustering, while filtering of the other two sets is performed by thresholding. When applying the k-means algorithm, two clusters of voxels are formed. Only those voxels that belong to the group whose centroid has more coordinates of higher value than another centroid are considered later on. This excludes voxels with predominantly lower values of intra-voxel features. Filtered sets of the curb-voxel candidates are combined by the intersection and union operations ( Figure 1).
The procedure is continued with selecting the representative point for each voxel from a filtered set of voxel-curb candidates. Selected points are clustered by agglomerative clustering and generated clusters are then grouped into two types: straight-line segments and curved curb segments, depending on achieved errors of estimated polylines for each point cluster. Finally, the resulting curb segments are joined to form a final detected road edge.

Semantic segmentation
The aim of the point cloud semantic segmentation is to assign each point to a selected class label from a set of pre-defined semantic class labels. We select deep learning as an approach for point cloud semantic segmentation, although it is a challenging task due to the unstructured characteristics of point clouds. From publicly available supervised deep learning models, we adopted the RandLA-Net neural architecture (Hu et al. 2020).
RandLA-Net is an efficient deep neural architecture designed for semantic segmentation of large-scale point clouds. It uses computationally and memory efficient random point sampling to greatly decrease the density of the point cloud. In order to preserve geometric details and useful point features during random sampling, a local feature aggregation is introduced. The local feature aggregation preserves complex local structures considering neighbouring point geometry and significantly increasing receptive fields. The Open3D-ML (2021) library for machine learning tasks, as an extension of the Open3D core library, contains a RandLA-Net implemented using TensorFlow 2. We used a RandLA-Net model trained on the Toronto-3D (Tan et al. 2020) dataset in order to execute segmentation of our point cloud data. All points in this dataset are classified into nine object classes: road, road markings, natural, building, utility line, pole, car, fence and unclassified.

Voxelization
After removing all points that are not recognized as road or road markings, the remaining points will be organized in voxels. During that process, known as voxelization (Wang et al. 2018), the rest of the point cloud will be divided into a collection of regular voxels and only non-empty voxels will be included. Voxel size is an important parameter during voxelization, because a too fine resolution increases the number of voxels, especially the number of empty voxels, while a too coarse resolution results in information loss and less differentiation between voxels. The variation in voxel size showed that for its small values the relative ratio between the initial number and the number of non-empty voxels is much higher ( Figure S1(a)), while edge detection performance is best for a voxel size of 0.1 m ( Figure S1(b)). The size of a regular voxel of 0.1 m was adopted here.
Organization of the point cloud in voxels can also be described as follows. Let a given point cloud be represents its coordinates along x, y and z axes, respectively, and N is the number of points. Assuming equal resolution along all axes in 3D point cloud space (Δx = Δy = Δz) during the voxelization, a regularly spaced grid of voxels organized into in rows, columns and layers is created. The newly established 3D array represents a set of voxels and it can be denoted as V = {v m (i, j, k), m = 1, . . ., M}, where v m is the m-th voxel, (i, j, k) are indices of the m-th voxel in the 3D grid, and M is the number of voxels. Indices (i, j, k) are bounded by (x max -x min )/Δx, (y max -y min )/Δy, (z max -z min )/Δz, respectively, while the m-th voxel contains a set of points that belongs to this 3D grid cell according to their coordinate values, i.e. v m = [p 1 , p 2 , . . ., p L ], where L is the number of points belonging to the voxel v m .

Definition of intra-voxel characteristics
As previously mentioned, the basic feature of a voxel belonging to a curb region is that there is a change in elevation between points in that voxel (Soilán et al. 2018). However, due to large variations in the possible shapes of curbs, and also the properties of the created voxel grid, it is not sufficient to adopt only one characteristic. To increase the reliability of the decision on whether or not the current voxel belongs to the curb region, several characteristics associated with the elevation change property were adopted. The next five features form the intra-voxel feature vector of m-th voxel v m : -roughness ratio, which shows whether the deviations of point values along the z-axis in the voxel v m are dominant in relation to the other two axes, defined in the following way: where @z=@x and @z=@y are average approximations of partial derivatives along nonuniformly segmented x and y axis, respectively.
An illustration of the intra-voxel feature vector component distribution has been given ( Figure S2).

Defining inter-voxel characteristics
The third assumption mentioned above is that characteristics of points in a voxel which is placed in the curb region should differ significantly from the characteristics of most voxels in its neighbourhood. For this purpose, the intra-voxel feature µ intra -the mean of z-coordinates -and g intra -gradient of elevation changes of points in the voxel -were selected and used. Namely, deviations of values of these features of all neighbouring voxels in relation to the current voxel were monitored.
Calculation of inter-voxel characteristics is closely related to the organization of the point cloud 3D grid. Namely, only non-empty voxels, more precisely only voxels that have at least three or more points, are included in the voxel structure (Börcs et al. 2017). This minimum number of points was adopted to prevent the occurrence of the impossibility of calculating some features for the selected voxel, for example the estimation of the voxel's plane normal. As a consequence of such voxel selection, it was necessary to introduce procedures for fast searching neighbouring voxels among the remaining ones that meet the required criteria as the nearest neighbours. For that purpose, the KD-tree implementation in Python's ScyPy library was used for space-partitioning, organizing and searching voxels. The 3D grid of voxels is organized into rows, columns and layers. The distance between voxels is determined based on 3D-space indices obtained during the voxelization process. When calculating inter-voxel characteristics for each voxel, voxels whose Euclidean distance was less than 1.5 of the index values were considered as adjacent voxels.
The curb detection problem monitors voxel characteristics that are at the same layer, so the problem of searching a 3D voxel grid can be transformed into a search for the sequence of 2D voxel grids obtained by segmenting the original 3D grid by layers. Bearing in mind that in a huge number of cases the number of layers is significantly higher than the number of rows and columns in the 3D voxel grid, savings in computational resources can be achieved. A complete inter-voxel characteristic calculation procedure is described by the pseudo-code in Table 1.

Definition of the voxel points surface slope
The slope of the surface formed by points belonging to the observed voxel is calculated in two basic steps. It is first necessary to determine the plane normal of the voxel's points, and then to calculate the angle of the obtained plane normal with the z-axis of the 3D point cloud space. Fitting the plane, i.e. determining the equation of its normal, was realized with the help of singular value decomposition (SVD). If the given voxel v m contains a point set P m = [p 1 , p 2 , . . . , p n ] nx3 , where n ≥ 3, the algorithm for fitting a plane using the SVD is as follows: The plane normal n is given as the third column of matrix V: n = V(:, 3), i.e. the last column of V that corresponds to the smallest singular value in Σ is the eigenvector corresponding to the smallest eigenvalue of A T A.
Finally, the slope of the voxel's surface, represented by the normal vector n, is calculated as the angle θ between the normal vector n and a 3D-space vertical vector e 3 = (0, 0, 1) i.e. cosθ = n e 3 /ǁnǁ•ǁe 3 ǁ. This equation can be simplified so that it has the following form: θ = cos −1 (n/ǁnǁ). Figure 1 shows the scheme of the curb detection procedure, where it can be seen that after the voxelization procedure, both sets of features, the intra-voxel characteristics as well as the voxel surface slopes for all voxels are calculated independently. These two voxel sets are combined by the union operation. Afterwards it is possible to calculate the inter-voxel characteristics, a calculation that builds on the calculated intra-voxel characteristics. Different procedures for selecting voxels which are suitable for voxel-curb candidates are then applied to all three sets of features.

Voxel filtering algorithm
Voxel slope features are used to verify if the angle between the voxel's surface normal and the 3D-space vertical vector is above the given threshold value (θ > th_cos). The intervoxel features are checked using predefined thresholds. The current, m-th voxel is considered to belong to the curb region if these conditions apply to its inter-voxel features: v m (µ intra ) > th_inter1 and v m (g intra ) > th_inter2, where th_inter1 and th_inter2 are predefined thresholds. Moreover, this condition is checked only for voxels whose
A filter for removing isolated voxels is applied to the resulting set of the intersection of these two groups of voxel-candidates. Euclidean distance is used as a measure of the distance between voxels. Figure 2 shows results of several described steps on an illustrative example which is presented in more detail in Section 3. Figure 2(a) represents the point cloud scene after semantic segmentation performed by the RandLA-Net and removal of all points except the ones that belong to road and road marking classes. Voxels extracted using intra-voxel characteristics and the voxels' surface slope thresholding are shown respectively in Figure 2(b,c) while Figure 2(d) presents the remaining voxels after the intersection of voxel sets presented in Figure 2(b,c). Results of voxel extraction using inter-voxel characteristics and their union with voxels given by the intersection of the previous two sets are given in Figure 3(a,b), respectively.

Connecting detected curb segments
The result of voxel-curb candidate selection, in addition to correctly recognized curbs, can be segments of sidewalk, lawns or ground. In addition to these true negative cases, there are parts of the curb that are not recognized because they are occluded by an obstacle, such as a parked car or a pedestrian. Besides, sometimes curbs are hardly detected as a road boundary because they are at the same level as the road, e.g. access ramps at crosswalks.
Consequently, the procedure continues with a selection of the representative point for each voxel-curb candidate, which is the point of the voxel that is closest to the voxel's centre of gravity in the 3D space. In the further course of the procedure, only two coordinates of the points are observed, i.e. the points' projection in the horizontal plane. Thus, selected points are grouped by agglomerative clustering (Figure 4(a)). Each cluster of points is fitted simultaneously by a first-order and a second-order polyline using the method of least squares, and if the error of estimation is smaller for a first-order polyline, then it is considered that this cluster of points belongs to the straight-line curb. Next, between the clusters of points adopted as a straight-line curb, those cluster pairs that have vectors with similar fitted coefficients are considered as curb segments (Rodríguez-Cuenca et al. 2015). Namely, the space of fitted line coefficients is formed from all point clusters, and by performing agglomerative clustering, segments of curbs of very similar orientations are grouped and assigned to the same straight-line curb segments. Figure 4(b) shows both segment types: a straight line and a curved. Straight-line curb segments are clusters 0, 4 and 6, while clusters of points whose estimation error for second-order polyline is much better than a first-order estimation are considered as curved curb segments (clusters 1, 17 and 21). Grouped collinear and spatially proximate curb segments are connected by points between them. However, if the distance between these curb segments is greater than a predefined value, it is necessary to additionally check whether that space is occupied by an object classified as a 'car'-class, which was obtained during the semantic segmentation. The resulting curb segments are joined to form a final detected road edge.
Finally, for each curb segment, all voxels belonging to a corresponding segment are collected and points within these voxels are organized into a KD-tree. It is used to find the lowest points in all areas of each curb segment and these points represent the precise location of the edge between the road surface and the detected curb.

Experimental results
Point cloud data dominantly used in this study were acquired throughout the city of Belgrade using a vehicle-mounted mobile mapping system called Teledyne Optech Maverick, which is equipped with a 32-line LiDAR sensor, a panoramic camera and a GNSS. The LiDAR sensor can capture point clouds at up to 700,000 points/second, covering a 360° horizontal field of view and a vertical field of view from −10° to +30°. Evaluation of the proposed method for road curb detection is performed on point cloud data collected in various urban scenarios. All selected scenes are semantically classified by a trained RandLA-Net and an output point cloud with only two extracted classes, road and road markings, which represent an input for the proposed method. In the first part of the experiment, representative urban scenes with straight streets, crossroads and streets with parked cars are selected to show the effectiveness of the method. After that, quantitative evaluation is performed. The main geometric characteristics of the point cloud data examples acquired in Belgrade are given in Table 2. In addition, the third section of the Toronto-3D dataset (Tan et al. 2020) collected on Avenue Road, Toronto, was chosen as the last example. This publicly available dataset was chosen because exactly the same equipment was used for data acquisition in all examples listed in Table 2.
The first example, selected for a detailed illustration of the proposed procedure, represents a section with the road junction in an urban city area ( Figure 5). The scene is specified by a road-sidewalk curb which is partially occluded by a parked car. In parallel with the road curb is the edge between the sidewalk and the grass, which is initially partially recognized as road markings. Figure S3 shows all points grouped into voxels assigned to cluster 6 shown in Figure 4(b) as a straight-line curb segment. Both 3D points' projections of this curb segment show positions of precisely located points representing the lower edges of a road-sidewalk (points coloured in red).
The second example represents a section with a straight street where one side of the road curb is almost completely occluded by parked cars (the road curb is almost completely invisible), and the opposite side with pillars and planted trees. Figure S4 displays the selected curb segments representing different point clusters which are coloured according to mutual similarity of their fitted coefficients. Obviously, most of the curb segments which are identified as segments with the same direction (blue segments on Figure S4(a)) have corresponding points in the space of fitted coefficients located mostly in the lower right corner of Figure S4(b), while the yellow and light blue points on the same figure correspond to different curb shapes, positioned in the upper right corner of Figure S4(a). Besides the straight-line segments, there are curved segments displayed on Figure S4(a) as purple arcs that correspond to the curb placed around the trees' trunks.
As can be seen on Figure S4(a), the curb detection algorithm extracts small visible curb fragments on the street's side which is occupied by parked cars. However, due to semantic segmentation and the fact that location of the car's class objects is known, it is possible to supplement the missing large curb fragments. A 'bird's-eye' view of the second example with detected curb segments coloured in green, which are joined with red lines, is shown ( Figure S5).  The third example represents the larger street section with two intersections. Figure S6 shows by a green line that the curb on the 'lower' side of a 'bird's-eye' street view is completely detected, while observing the opposite side of the street it can be seen that there are missing segments, due to parked cars, garbage containers and terrain irregularities. In the last step of the algorithm, the largest number of curb segments is successfully connected (marked as red lines).
We used three evaluation metrics to quantitatively evaluate the proposed method, namely precision, recall and F 1 -score. We used these metrics to measure correctness, completeness and quality of the curb extraction, respectively (Xu et al. 2017, Yang et al. 2019. Completeness itself is equal to the probability of true curbs that can be extracted. It is evaluated as the true positive rate (TPR), calculated as TPR = TP/(TP+FN), where TP (true positive) is the length of the extracted curb matching the reference road curb, while FN (false negative) is the length of a non-extracted road curb. Correctness of the curbs means that the extracted curbs belong to the true road curbs. We evaluate the correctness as the positive predictive value (PPV), calculated as: PPV = TP/(TP+FP), where FP is the length of the extracted false positive road curb. The F 1 -score represents a weighted harmonic mean of the precision and the recall, thus it gives a well-balanced quality measure of the road curb detection method. The F 1 -score is calculated as: F 1 -score = 2 TPR • PPV/(TPR+PPV). Table 3 displays a summary of quantitative measures evaluated on all examples. A manual annotation of the ground truth was made, and accordingly a comparison with the detected curbs, at the level of the voxel grid.
It is important to emphasize that the developed procedure was performed on data generated by semantic segmentation done with RandLA-Net. As a consequence, the algorithm does not consider road segments that are not recognized in the semantic segmentation. Data displayed in Table 3 show uniform and high performances for all examples, especially for the recall metric. However, it is important to emphasize that the proposed method not only successfully detects road curbs, but also restores hidden curbs occluded by objects classified by the RandLA-Net as a car.
Besides the performance evaluation of the proposed method using point cloud data from the Maverick system, evaluation of point cloud data recorded with Velodyne LiDAR was performed and downloaded from a publicly available dataset (VeCaN Laboratory 2017). Two frames with scenes of straight and curve roads were used, which cover ± 30 m in front of and behind the vehicle. The scenes are characterized by point cloud density decline, the further we go from the centre of the scenes. We applied our algorithm without adaptations, except for the increase in voxel size to 0.2 m in order to enhance the number of non-empty voxels. Due to the sparse point layout, the algorithm has formed a grid structure with non-empty voxels only in the central section of the scene, which reaches up to 10 m around the vehicle. Therefore, Table 4 shows performances of the algorithm on the complete and central parts of the scenes, including the performance of the three new methods given in Zhang et al. (2018). Since the method doesn't consider the space beyond the voxel grid, during testing the complete scene, FN is high and TPR is low. However, FP remains low and PPV is better than with the rest of the methods. If we observe the central part of the scene with an existing voxel grid, the proposed method shows superior performance compared to other methods.

Conclusion
The described road curb detection procedure represents a combination of a deep learning-based model followed by algorithms for spatial statistics, which include voxelization, feature extraction, k-means and agglomerative clustering, the use of KD-trees, and polyline fitting. Semantic segmentation using a deep neural network has enabled the effective extraction of the point cloud parts recognized as road, which significantly simplified and accelerated the further process of curb detection. An additional advantage of the application of semantic segmentation is knowing the position of objects assigned to the car class, which enables the connection of detected curb segments with missing fragments of curbs occluded by parked cars. However, the result of semantic segmentation in the form of point cloud fragments belonging to the road class is still far from the desired outputdetected curbs. In order to conduct the most efficient data processing, it was necessary to organize the point cloud data into a 3D voxel grid. Based on the spatial statistical analysis, three sets of voxel characteristics called intra-voxel, inter-voxel and voxel surface slope characteristics were singled out. By a separate analysis and filtering of these sets of characteristics on a given 3D voxel grid, it was possible to generate sets of voxel-curb candidates. Representative points of these voxels are grouped into spatially proximate and similar in shape clusters of points that correspond to the curb segments. Grouping of clusters similar in shape was performed by checking whether the cluster points fit better with straight or curved edges. Finally, regarding the detected road curbs, precise meeting points of the curbs and the road surface were determined. Qualitative and quantitative verification of the proposed procedure was conducted on several point cloud samples acquired in the city of Belgrade, as well as on data from the Toronto-3D dataset. Analysis of the results showed that the procedure successfully detects straight and curved road curbs, including precise determination of the edge between the road surface and the curb. Quantitative measures of the proposed procedure performed on the considered point cloud examples have always achieved values significantly over 90%.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the Ministry of Education, Science and Technological Development of Republic of Serbia; Innovation Fund of Republic of Serbia.