3D map reconstruction from range data

We present techniques for building models of complex environments from range data gathered at multiple viewpoints. The challenges in this problem are: the matching of unregistered views without prior knowledge of pose, the use of very large data sets, and the manipulation of data sets of different resolutions and from different sensors. Our approach is unique in that no prior knowledge of the relative viewpoints is needed in order to register the data. We show results in building maps of interior environment from range finding data, building large terrain maps from ground-based and from aerial data, and from an operational for mapping from stereo data for hazardous environment characterization. The paper summarizes the major results obtained so far in this area.


Introduction
The problem of building models from multiple views is critical in various applications, including remote operation, virtual environment building, and construction of object libraries for recognition. In this paper, we consider the problem of registering range data sets from multiple locations in order to build a complete model of an environment. A typical scenario involves many viewpoints over a large area with poor or nonexistent initial estimation of the relative positions of the views. Eventually, our goal is to build maps that cover hundreds of square meters at very high (e.g., sub-centimeter) resolution. This problem has many facets, including computing the transformations between views, correcting the transformations to compensate for drift and error accumulation, merging the views into a single model once the transformations are computed, and incorporating color and texture information into the final model. Here, we focus on the first problem: reliable registration of large point sets from range data. Examples of view merging are shown only to illustrate the results of registration.
The method described here is based on a general approach fully to object recognition [ 141 and three-dimensional object modeling. This method has two critical features which address the shortcomings of previous techniques. First, it does not require surface segmentation or feature extraction. Second, it does not require knowledge of the transformation between surfaces prior to registration. Owing to these two features, this surface matching algorithm can be applied to problems of large-scale model building.
The goal of this paper is to show how our matching algorithm can be used in the context of mapping large, cluttered environments. The paper summarizes the major results obtained in this field. More technical details can be found in earlier publications [2][11].

Point Matching Using Local Signatures
In [9], Johnson introduces the spin-image, a two-dimensional signature describing the local shape of a free-form three-dimensional surface at a point p on that surface. Spin-images encode the positions of points nearp in terms of distance along and distance from the approximated normal to the surface at p . By comparing this coordinate information from the spin-images of two different points, we arrive at a measure of local shape similarity between the surfaces surrounding those two points.
To construct the spin-image for an arbitrary point p , we first find the best-fit plane to the nearest neighbors o f p and approximate the normal to p as the normal to this plane. We then define a 2-D basis using the normal n and the plane Tperpendicular to n and passing throughp. For each point x in the vicinity of p we compute its coordinates (a$) with respect to this basis; a is the distance fromp to x measured in the plane 2 while fi is the perpendicular distance from x to T (Figure 1   Spin-image similarity is determined by simple linear correlation of bin values, supplemented with a confidence metric to take into account the number of empty bins in. Bins containing no points are not considered in the correlation calculation to help minimize the effects of clutter and occlusion on shape similarity by only considering bins in the spin images that have been filled. The final similarity measure between two points takes into account the number of bins used to compute correlation, so that signatures with the highest amounts of overlap are considered the most similar.
Johnson shows in [9] that through determining the similarity between spin-images of points from two different surfaces, it is possible to recover the transformation that registers the surfaces even though no initial estimate of this transformation is known at the time of registration.

View Registration
The matching approach described above provides a powerful tool for building large-scale models from sensor scans taken at unspecified locations. In this section we describe the process of using spin-images to register range data taken from a variety of locations. We assume that regardless of the sensor used, range data from it consists of 3-D points in some fixed coordinate frame. Given these points, a triangular mesh is formed by connecting nearest neighbors, and noisy points and edges in this mesh are removed through cleaning and smoothing operations. At this point, we have a high-resolution 3-D representation of the model as seen from a particular viewpoint; for reasons to be explained below, we then simplify this surface using the algorithm found in [ 121 to obtain a low-resolution version of the surface.
Given two low-resolution meshes, we register them using signature matching as follows. First, a fixed fraction of points is selected at from both surfaces. The signatures are produced for these points, and all the signatures from one surface are compared to all the signatures from the other using the similarity measure described above. When a pair of signatures are found to have high similarity, the points that produced them are considered to correspond to each other ( Figure 2); after all signature comparisons have taken place, we are left with a set of point matches between the two surfaces. Finally, an estimate of the transformation that aligns the two surfaces is computed from the point matches.
Based on this estimate an ICP algorithm is applied to the approximately-aligned high-density surfaces. The resulting transformation aligns the full range data sets with acceptable accuracy.
One feature of this procedure is that the only step in the process which depends on the user's choice of range sensor is the acquisition of range data; the manipulation of meshes, registration of surfaces, ICP, and merging are completely independent of the range sensor used. In fact, this technique has been used with eleven different range sensors to data: examples with three sensors are included at the end of the paper.

Surface Matching with Large Data Sets
The basic procedure outlined above is sufficient for data sets of moderate size, for example, individual objects. When dealing with data sets that cover a large area, two issues must be addressed carefully. First, the large variation of data resolution across the area scanned complicates surface matching. Second, the size of the data may render the matching procedure computationally unpractical.

Variable Data Resolution
In the representation described thus far, the signature images are computed by histogramming the vertices of the model or scene meshes. As a result, the distribution of the vertices on the mesh directly affects the signatures. In fact, two meshes with different vertex distributions may generate very different signatures at the same basis point. Therefore, in order for the surface matching algorithm to work properly, some constraint has to be enforced on the distribution of vertices on the meshes. Specifically, it can be shown that the signatures remain stable as long as the vertices are uniformly distributed on the surface.
Although this approach works well in practice, it has several problems. First of all, there are cases in which the data simply cannot be made uniform without loosing a great deal of information because the variation of resolution in the input sensor data is too large. A typical example is terrain data taken from a forward-looking sensor. The sensor data varies from high-resolution at close range to quadratically decreasing resolution as the range from the sensor increases. Variations in data point spacing of as much as 1:lO are routinely observed on terrain data. The second problem is that the uniform decimation requires on the same order of computation time as the matching itself, even though much faster decimation and filtering algorithms do exist.
The solution to those problems is to compute the signatures by integrating over the entire surface rather than by computing a and p values at the vertices only. Essentially, this requires interpolating the spin-image values "in between" the mesh vertices. The simplest way of achieving this is to raster scan each triangle of the mesh (Figure 3 ) and to compute the (a$) coordinates of each point inside the face. The corresponding location in the signature is incremented by a constant amount for each new point. This algorithm can be made efficient by using a fast geometric test in order to determine whether a face is inside the region of influence of the basis point and is within the boundary of the spin-image space.
This approach is still an approximation because it uses a discrete sampling of the surface. In particular, although the signatures are less sensitive to the distribution of vertices, they are still sensitive to the choice of the sampling rate used for interpolation. The second approach is exact in that it computes the spinimages by integration over the whole surface without additional sampling. In this approach, the boundary of each triangle is mapped into ap-space, as shown in Figure 4 (a).
Each edge of the face maps to a segment of hyperbola. The hyperbolic segments computed in the projection are then used for determining which cells of the spin-image may contain some portion of the triangle. Figure 4 (b) shows the portion of the spin-image that contains a portion of the triangle based on the segments of Figure 4 (a). Finally, each cell in the spin-image is incremented by the area of the part of the triangular face that is mapped to that cell in ap-space.

I -.
(a> This last step is illustrated geometrically in Figure 4 (b).
The region of 3D corresponding to the cell (a#) is an annulus of height AB and thickness Aa. The cell is incremented by the surface area of the intersection between the triangle and this annulus.
Because it uses the actual surface area for incrementing the signature cells, this algorithm computes an "exact" mapping of the surface to the signatures, given a mesh discretization of the surface. Figure 5 shows the difference between a signature computed with and without interpolation. u u

Fast Access Data Structures
This part of the work addresses the practical use of the matching techniques, in particular using the more advanced surface integration, for the very targe data sets that one expects to encounter in applications such as building terrain models, or virtual models of large interior environments.
The main potential obstacle to the practical use of signature techniques is that the computation of the signatures may become prohibitively expensive if the data set is very large. In particular, the amount of computation needed from each face in order to compute the signature at a given basis point is much more substantial than in the standard method. Therefore, it becomes especially important that only the points that are inside the region of influence of a basis point be used for computing the corresponding signature.
The standard approach to this type of problem is the use of variants of the K-D tree structure designed for fast access in multidimensional spaces. After evaluation of several implementation of similar geometric data structures, the best design turned out to be a regular hierarchical data which is similar to octrees, except that, because we are working with 2-D surfaces, the tree is sparse and access can be efficiently implemented by a fast hashing method. The graphs below illustrate the improvement in signature computation speed obtained using this technique.
Because of the overhead involved in computing the data structure, and in computing the hashing function and retrieving points from the data structure, this technique is really beneficial only for large data sets. In fact, the computation is slower for data size of moderate size. The graphs show that the crossover point is at approximately 8000 points (Figure 6.) This technique should not be used for smaller data sets.

Indoor Environment Reconstruction
As a first example, the surface registration technique has been used for building a model of a large warehouse space composed of two adjacent rooms. The building measured roughly 60 meters long by 20 meters wide by 10 meters high, and was filled with an assortment of clutter and debris as shown in Figure 7. We used a K2T/Z+F laser Each scan of the range finder returned 1.8 million 3-D points; through naive subsampling this set was reduced to 65000 points. The resulting point cloud was converted into a mesh and resampled to roughly 5000 points for registration.
Surface meshes produced from each of the 32 data sets were successfully registered to the meshes of adjacent scans. Figure 8 shows the registration results for two of the meshes. With the transformations between neighboring once the transformations are comouted. meshes known, it was then possible to align all meshes in a common coordinate system so that they could be unified into a single mesh that covered the entire warehouse space.
Because of limitations in our implementation of the merging algorithm, only 20 of the views were merged into the final model, although all 32 data sets were registered. The complete mesh of the warehouse contained 138000 points at full resolution, while a low-resolution version of the model contained 25000 points. The resolution of the final mesh was low compared to that of the constituent highresolution views due to limitations of the mesh merging implementation; otherwise, it would have been possible to produce a final mesh with no resolution loss. Statistics for each of the different types of surface mesh--high-resolution single-viewpoint, low-resolution single-viewpoint, high-resolution complete, and low-resolution complete-are summarized in Table 1. Resolution is measured as the average length of edges in the mesh, in millimeters.  model from these vantage points. As mentioned above, we did not attempt to refine the final merged model; more sophisticated merging algorithms can certainly be used

Large-scale Terrain Mapping
A second example of map reconstruction is map building from a ground-based range sensor and from an autonomous helicopter. The ground-based sensor is a K2T/Z+F sensor [5] which was configured to generate range and reflectance images of size 6000 x 300, corresponding to angular resolutions of 0.06" and 0.1" in the horizontal and vertical directions respectively. The initial data is subsampled by a factor of 5 horizontally and 3 vertically and then converted to a mesh.
The helicopter range data was preprocessed by the CMU Autonomous Helicopter group, but we will briefly describe the steps used. The combination of the line scanning sensor, helicopter motion, and position uncertainty causes sequential scan lines to overlap unpredictably. Therefore it is not possible to simply connect the data points as with the image based scanner. Instead, the points from a series of scans are projected orthographically into a and was successfully demonstrated on large 3-D data sets.
grid, and the range is calculated using the closest point in Beyond the two examples presented here, the system has each bin. Then the bins are treated as a range image and been exercised with eleven different range sensors, from converted to a mesh using the same method as with the active laser range finders to passive stereo systems.
Now that we have a reliable algorithm for building maps, ground-based sensor.
The registration algorithms were tested on several large we are beginning to analyze the limitations of the data sets, two of which are shown here. The first data set approach. The next step is to investigate how terrain shape was acquired with the sensor mounted on the roof of a van affects the algorithm's performance. Two areas of investiat a slag heap located near CMU. The data is a sequence of gation are of particular interest: intelligent selection of 13 range images obtained at 3-5 meter intervals along a points and dynamic estimation of overlap between views. road leading between two hills. After the 71h data set, the road curves to the left. The registered data sets were merged into a single, global map shown in Figure 8.
Helicopter group during field tests at Haughton crater in the Canadian arctic as part of NASA's Haughton-Mars project. The integrated terrain map in Figure 11 was formed from three passes of the helicopter along the boundary of a 20-meter cliff and covers a 260 x 166-meter area.

Conclusion
Building environment models from unregistered views is challenging because of the difficulty in extracting and matching surfaces, the size of the data sets, and the variation in resolution and accuracy among views and sensors. The approach presented here addresses those challenges The second data set was collected by the Autonomous