Ortho-image analysis for producing lane-level highway maps

This paper presents new aerial image analysis algorithms that, from highway ortho-images, produce lane-level detailed maps. We analyze screenshots of road vectors to obtain the relevant spatial and photometric cues of road image-regions. We then refine the obtained patterns to generate hypotheses about the true road-lanes. A road-lane hypothesis, since it explains only a part of the true road-lane, is then linked to other hypotheses to completely delineate boundaries of the true road-lanes. Finally, some of the refined image cues about the underlying road network are used to guide a linking process of road-lane hypotheses. We tested the accuracy and robustness of our algorithms with high-resolution, inter-city highway ortho-images. Experimental results show promise in producing lane-level detailed highway maps from ortho-image analysis -- 89% of the true road-lane boundary pixels were successfully detected and 337 out of 417 true road-lanes were correctly recovered.


INTRODUCTION
This paper proposes new aerial image analysis algorithms that produce a map of road-lanes appearing on a given highway ortho-image. The output of this procedure is cartographic information about road-lanes in a set of pixel coordinates of road-lanes' centerlines and lateral road-widths.
Because our target images are depicted in high-resolution (i.e., 15-centimeter ground resolution), such image objects as lane-markings and road image-regions contain significant Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SIGSPATIAL GIS'12, November 6-9, 2012. Redondo Beach, CA, USA, Copyright (c) 2012 ACM ISBN 978-1-4503-1691-0/12/11...$ 15.00 . variations in their appearances, such that an object appears differently based on the condition of an image acquisition process and road surface materials.
To effectively tackle these challenges, we develop a hierarchical approach to three tasks: gathering road boundary image cues, generating road-lane hypotheses, and linking the hypotheses. To this end, we first scrutinize input images, to harvest two types of image cues about the underlying roads: road image-regions and the geometry of underlying roadlanes. These collected image cues about road surface and geometry will provide strong evidences of the true road-lanes. In particular, these cues facilitate a road-lane hypothesis generation and guide a linking of these hypotheses to build an accurate map of road-lanes. For the problem of linking road-lane hypotheses, we formulate it as a min-cover problem [3,15] where we look for a set of hypotheses about the unknown true road-lanes to maximally cover the estimated road image-regions with a minimum sum of costs.

RELATED WORK
In the GIS community, aerial image analysis has played a crucial role in maintaining existing cartographic databases [1,2,4,6,9]. This is primarily because the topological relations among spatial objects (e.g., intersection between road segments) appearing on aerial images are invariant over a long period of time, even after natural disasters [14]. An aerial image analysis also provides an alternative way of maintaining existing cartographic databases without virtually going out to the regions of interest. The clear difference between these work and ours is the ground resolution. Most of these work analyzed low-resolution aerial images in which the ground resolution was greater than one meter [1,2,4,5,6,9]. Variations of object appearances in a lowresolution aerial imagery are not as significant as those of high-resolution imagery.
We detect overpasses to identify potentially complex road geometry. To recover such 3-dimensional road structures, researchers have directly accessed a road vector or utilized 3D data such as air-borne point clouds [11,12]. By contrast, our method, without using any of these specialized data, detects overpasses by using screenshots of road-vectors. Figure 1: A screenshot of the road-vector of the input image.

HARVESTING ROAD-BOUNDARY IMAGE CUES VIA BOOTSTRAPPING
age features such as lines [7] and superpixels [8]. To extract the useful geometric information of the underlying roads, we analyze a road-vector screenshot to extract image regions of road-vector sketches, (i.e., yellow drawings in Figure 1), and then further analyze each of the road-vector sketch fragments, to obtain their geometric properties, such as extremity and bifurcation points. Next we refine these low-level features to produce more relevant and useful features, such as a segmentation of a road image-region, an estimation of some legitimate driving directions of roads appearing on the input image, a lane-marking detection, and locations of interesting road-structures, such as intersections and overpasses.
For identifying road image-regions, we first convert an input ortho-image into a superpixel image and then represent each of the superpixels by a combination of color histogram and textons [10]. Next we perform, by using a combination of Gaussian Mixture Model (GMM) and Markov Random Fields (MRF), a binary classification that takes superpixels as input and assigns each superpixel with one of two class labels: road or non-road. Figure 2 shows a result of image road-region segmentation. Results of road image region segmentation. The blue regions represent identified road image-regions and the red regions represent non-road image-regions. Although some non-road image-regions are labeled as road, for the most part segmentation results correctly depicted road image-regions.
To detect the driving direction from a given image, we use line extraction results that each of the extracted lines partially explain as the contour of roads in a given image. We first partition the input image into a number of grid cells. For each grid cell, we identify extracted lines which pass by it and use them to approximate the driving direction of the grid cell by using the vector sum method. Figure 3 shows a result of driving-direction estimation.
For detecting lane-markings, we perform a binary classification problem of discriminating non lane-marking pixels from true lane-marking pixels. To this end, we tried six different classification setups with ortho-images separated from the images for generating lane-level highway map and found that AdaBoost outperformed all others. Figure 4 shows a result of lane-marking detection.
For detecting overpasses, we analyze the road-vector screenshot image to convert it into a set of road-vector fragments. For each of the road-vector fragments, we extend each of the  extremity points in the direction of the fragment. To localize a potential overpass, we then identify any intersection with other fragments if their intersection angle is greater than a given threshold (e.g., π/3). Finally, we search for any of the closest extracted lines to identify boundaries of the detected overpass. Figure 5 shows the final result of overpass detection. Figure 5: Results of overpass detection. A red parallelogram represents the boundary of the detected overpass, and two (blue and cyan) lines inside the polygon depict two principal axes.
In the next section we detail how these four refined features are used to generate road-segment hypotheses and how to link them to build lane-level detailed highway maps.

ROAD-LANE HYPOTHESIS GENERATION AND EVALUATION FOR DELINEATING TRUE ROAD-LANE BOUNDARIES
The previous steps of extracting image cues about the true road-lanes provide us a better understanding of roadlanes appearing in the input image. In particular, we know which image sub-regions are most probably road-regions, which pixels within the road-regions are likely parts of lanemarkings, how the roads are laid out, and where overpass structures occur. Based on this understanding, we are generating road-lane hypotheses and linking them, in order to delineate road-lane boundaries.
A road-lane is modeled by a piecewise linear curve that consists of multiple control points along the centerlines of the road-lane and their properties, such as lateral width and orientation. We use lane-marking pixels to approximate locations of control points along the true road-lane.
A true lane-marking pixel has many neighboring lanemarking pixels regularly-spaced longitudinally and laterally (or orthogonal to the longitudinal direction). We are looking for lane-marking pixels that have strong supportive (or neighboring) patterns in longitudinal and lateral directions of the roads. For each superpixel, we investigate whether each of the lane-marking pixels has a sufficient number of neighboring lane-marking pixels in longitudinal and lateral directions on the roads. Any lane-markings with more than the predefined threshold remain in the candidate list for generating road-lane hypotheses. For each road-width cue (or road-width hypothesis), we draw two lines, longitudinally, from the center of the two lane-marking locations and group together any road-width cues within extending line segments. This forms a road-lane hypothesis. The longitudinal direction corresponds to the driving direction estimated earlier from extracted lines. This search results in grouping the neighboring road-width cues around the input road-width hypothesis.
A road-lane hypothesis, since it explains only a part of the true road-lane, is then linked to other hypotheses to completely delineate boundaries of the true road-lanes. We formulate the problem of linking hypotheses as a min-cover problem in which we search for a new set of road-lane hypotheses of linking the generated road-lane hypotheses based on the previously obtained local evidences of the unknown true road-lanes with a minimum sum of linking costs. Our formulation is motivated by two previous studies [3,15]. For our case, we generate a set of hypotheses about unknown true road-lanes to cover approximated true road image regions. The previous studies generated hypotheses to delineate object contours [3] and to cover road regions in a LIDAR intensity image [15].
To find approximate solutions to these cost functions, we devise two linking functions. The first linking function considers a potential connection between any two hypotheses purely following geometric constraints. And the second function investigates any photometric constraints of a potential link. The optimal link between two road-lane hypotheses would be one that locally minimizes these two constraint functions. Unlike previous work of the min-cover algorithm applications [3,15], where their solutions were explicitly searching for a sequence of hypotheses, we look for a set of hypothesis pairs such that their potential, geometrically plausible, links are sequentially traced by photometric image cues to cover road image-regions.

EXPERIMENTS
This section details experiments conducted to investigate the robustness of our approach to extracting a lane-level highway map and the accuracy of the resulting maps.
From Google's map service 1 , we collected 50 ortho-images that are sampled from the route between the Squirrel Hill Tunnel to the Pittsburgh International Airport. We also saved road-vector screenshots of the ortho-images and manually drew boundary lines of individual road-lanes in each of the collected images for the ground truth. 2 To the best of our knowledge, no prior work or image data was available on extracting road-lane boundaries that we could have used for comparison. Hence, we had to come up with reasonable ways of evaluating our results. We evaluate resulting road-lane boundary delineation in two-ways: accuracy of matching between output and ground truth pixels and counting the number of correctly recovered road-lanes in the final outputs. Matching pixel to pixel aims at investigating the performance of our approach at the micro-level; counting the number of road-lanes aims at revealing the accuracy of the resulting geometries.
To evaluate our results at a pixel-to-pixel level, we utilized the method from evaluating performance of object boundary detection [10]. Similar to [10], we regard the extraction of road-lane boundaries as a classification problem of identifying boundary pixels and of applying the precision-recall metrics using manually labeled road-lane boundaries as ground truth. For resolving the correspondence between output pixels and ground truth pixels, we utilized the Berkeley Segmentation Engine's 3 performance evaluation scripts. We also used, as a baseline method, BSE's probabilistic boundary detection outputs. BSE was developed to detect generic object boundaries, not road-lane boundaries. In addition, since training BSE with our image data is not possible, it may fall short of being a fair comparison. But since anyone can think of such probabilistic boundary outputs as a starting point of delineating road-lane boundary lines, we compared it with our output. Table 1 presents an averaged performance difference between the two outputs over fifty test images.  Table 1: An averaged precision-recall measure of micro-level performance between the two outputs.
In achieving our goal, the performance evaluation by a pixel-to-pixel matching for road-lane boundary extraction outputs might be insufficient. The pixel-to-pixel measure counted a match when an output boundary pixel was located to a true boundary pixel within a predefined distance threshold (e.g., 10 pixels). Therefore a collection of boundary pixels would not necessarily correspond to a road-lane boundary. To be useful, these detected boundary pixels must be interpreted as parts of a road-lane. In other words, the desirable output for our purpose, is one that treats a road-lane as a polygon, bounded by a closed path and image boundaries, where we can estimate lateral road widths, curvature, and other interesting geometric properties along the centerline of a road-lane polygon. To measure such macro-level performance, we first visually inspected our outputs and the input image to resolve the correspondence between the resulting road-lanes and true road-lanes appearing on the input image. We then counted the number of correct and incorrect output road-lanes and missed true road-lanes. If the area of overlap between a road-lane output and a true road-lane was roughly greater than 80%, then we counted it a correct match. This counting resulted in a two-contingency table for the performance of each test image. Table 2 shows a macrolevel performance that is obtained by merging individual contingency tables over fifty test images. An averaged performance was then computed by using this table, precision = 0.792 = 337 337+88 , and recall = 0.771 = 337 337+100 , meaning that 79% of the resulting road-lanes were correct and 77% of true road-lanes appearing on the test images were correctly recovered.
Examples of resulting maps are shown in Figure 6. Figure  6 shows some of the most accurate results with all of the true road-lanes appearing on test images recovered correctly. While processing these images, our approach successfully tracked high-curvature ramps, correctly connected road-lane  Table 2: A contingency table is used to measure the macrolevel performance of our highway map generation methods. boundaries around overpasses, effectively handled variations in road-surface materials, and partial image distortions.

CONCLUSION
This paper presented a new approach to extracting lanelevel detailed highway maps from a given ortho-image. We chose high-resolution, inter-city highway ortho-images as target images because pixels along road-lane boundaries must be visually and computationally accessible. To effectively address photometric and geometric challenges appearing on our target images, we developed a hierarchical approach to three tasks: to collecting road boundary image cues via bootstrapping, to generating hypotheses about the unknown true road-lanes, and to linking hypotheses with respect to the photometric and geometric constraints imposed by the collected image cues and prior information.
We tested our algorithms with 50 challenging arterial highway images. The results were evaluated according to two aspects: pixel-to-pixel matching and counting correct and incorrect outputs. Our approach demonstrated promising results in that, overall, 79% of the resulting road-lanes were correct and 77% of true road-lanes appearing on the test images were correctly recovered.
Although we believe our test images pose sufficient challenges for the task of producing lane-level detailed highway maps, for future work, we would like to test our algorithms with more challenging aerial images.