Fully automatic registration of multiple 3D data sets

This paper presents a method for automatically registering multiple rigid three dimensional (3D) data sets, a process we call multi-view surface matching. Previous approaches required manual registration or relied on specialized hardware to record the sensor position. In contrast, our method does not require any pose measuring hardware or manual intervention. We do not assume any knowledge of initial poses or which data sets overlap. Our multi-view surface matching algorithm begins by converting the input data into surface meshes, which are pair-wise registered using a surface matching engine. The resulting matches are tested for surface consistency, but some incorrect matches may be indistinguishable from correct ones at this local level. A global optimization process searches a graph constructed from the pair-wise matches for a connected sub-graph containing only correct matches, employing a global consistency measure to eliminate incorrect, but locally consistent, matches. From this sub-graph, the rigid-body transforms that register all the views can be computed directly. We apply our algorithm to the problem of 3D digital reconstruction of real-world objects and show results for a collection of automatically digitized objects.


Introduction
The advent of relatively low-cost, commercially available laser range sensors has greatly simplified the process of accurately measuring the 3D structure of realworld environments, driving the need to automate the processing of 3D data.One problem frequently encountered in 3D data processing is registration, the process of aligning multiple 3D data sets in a common coordinate system.In existing applications, registration is accomplished either by hand or through the use of an external position measurement device such as a calibrated turntable.This paper introduces a third alternative: multi-view surface matching, which does not require any external measurements or manual intervention.Formally, we want to solve the following problem: Given an unordered set of overlapping 3D views of a static scene and no additional information, automatically recover the viewpoints from which the views were originally obtained, thereby registering the views in a common coordinate system.We do not assume any prior knowledge of the original sensor viewpoints or which views contain overlapping scene regions (overlaps).Furthermore, the views are unordered, meaning that consecutive views are not necessarily close together spatially.This problem is analogous to assembling a jigsaw puzzle in 3D.The views are the puzzle pieces, and the problem is to correctly put the pieces together without even knowing what the puzzle is supposed to look like.
The relationship between multi-view surface matching and existing 3D registration problems is illustrated by the simple taxonomy shown in Fig. 1.The taxonomy contains two axes: the number of input views and whether initial pose estimates are known.A pose estimate is a rigid body transform (e.g. three rotations and three translations) and can be specified for a single view in world coordinates (absolute pose) or between a pair of views (relative pose).When an initial estimate of the relative pose is known, aligning two views is called pair-wise registration.When more than two views are involved and initial pose estimates are given, the process is called multi-view registration.With unknown pose estimates, aligning a pair of views is called pair-wise surface matching.Finally, multiview surface matching occupies the fourth corner of this taxonomy, extending pair-wise surface matching to more than two views.The problems in the first three categories are interesting research topics on their own, but they are particularly relevant here because we use algorithms from each category as components of our multiview surface matching algorithm.Rather than invent these components from scratch, we build on existing algorithms in each category.
Multi-view surface matching is a difficult problem because it involves solving three interrelated sub-problems: (1) determining which views overlap; (2) determining the relative pose between each pair of overlapping views; and (3) determining the absolute poses of the views, which is the ultimate goal of multi-view surface matching.There is a mutual dependency between the overlaps and relative poses.If the relative poses are known, the overlapping views can be determined by applying a suitable definition of overlap to the registered pairs.If the overlaps are known, the relative poses can be found using a pair-wise surface matching algorithm.In this case, the results must be manually verified for correctness.Once the overlaps and relative poses are known, the absolute poses can be determined by multiview registration.
For multi-view surface matching, both the overlaps and the relative poses are unknown, which makes the problem considerably harder.We approach the problem by dividing it into two phases: a local registration phase, which operates only on pairs of views, and a global registration phase, which involves all of the views (Fig. 2).In the local registration phase, the N input views ðV i ; i [ 1…NÞ are converted to surface meshes ðS i ; i [ 1…NÞ; and a surface matching algorithm [1] is applied to all view pairs.The resulting matches are verified for surface consistency, but some incorrect matches may be indistinguishable from correct matches at this local level.As a result, the multiview surface matching problem cannot be solved just by looking at the consistency of pairs of views.Instead, we must consider the global consistency of an entire network of views.This is accomplished in the global registration phase.First, the filtered matches from pair-wise registration are collected in an undirected graph called the model graph, which encodes the connectivity between overlapping views.In the global registration phase, we search this model graph for a connected sub-graph containing only correct matches.We pose the search as a mixed continuous and discrete optimization problem.The discrete optimization performs a combinatorial search over the space of connected subgraphs of the model graph, using a global surface consistency criterion to detect and avoid incorrect, but locally consistent, matches.The continuous optimization adjusts the absolute pose parameters to minimize the distance between all overlapping surfaces, distributing small pair-wise registration errors in a principled way.The final output, the absolute poses of the input views, can be computed directly from the resulting graph 1 .
We demonstrate and test our algorithm in the context of 3D object digitization, the purpose of which is to create a 3D digital reproduction of a real-world object (Fig. 3).In our application, which we call hand-held modeling, the object to be digitized is held before a laser scanner while range images are obtained from various viewpoints.This is an easy data collection method, requiring no specialized hardware, minimal training, and only a few minutes to scan an average object.Alternatively, the model can be placed on a table during each scan, or a portable scanner can be moved around while the scene remains stationary.Once data collection is complete, our application produces a digital model of the original object by automatically registering the input views and then merging the registered views into a single entity.Although we illustrate our algorithm with 3D object digitization, the method is general and can be applied in any situation where multiple 3D data sets must be registered.
This paper is organized as follows: we begin by summarizing the related work in Section 2. Section 3 provides the necessary background on the model graph concept.In Section 4, we define and compare three measures of surface consistency.Sections 5 and 6 give the details of the multi-view surface matching algorithm, with Section 5 focusing on the local registration phase and Section 6 dealing with the global registration phase.Section 7 presents our results, including a comparison of three versions of our algorithm on a set of real test objects, an analysis of the accuracy and computational complexity of the algorithms, and examples of our algorithm applied to data from alternate sensors and data collection methods.Finally, in Section 8 we summarize our algorithm and discuss areas of future work.

Related work
We begin with a review of the relevant registration algorithms according to the taxonomy in Fig. 1.A survey by Campbell and Flynn provides additional details on these algorithms [2].
Pair-wise surface matching algorithms register two surfaces without requiring any initial estimate of the relative pose, searching globally over the space of relative pose parameters.Such algorithms are often used in 3D object recognition systems, where the goal is to determine the pose (if any) of one or more 3D model objects within a cluttered 3D scene.Typically, these algorithms work by finding correspondences between the model and scene based on an invariant surface property such as curvature.Given a sufficient number of correspondences, the relative pose between model and scene can be estimated.If the registration error is small enough, the object is declared recognized.One approach to establishing model-scene correspondences is to use point signatures, which encode local surface properties in a data structure that facilitates efficient correspondence search and comparison.Proposed encodings include spin-images [1], splashes [3], point signatures [4], harmonic shape images [5], spherical attribute images [6], and the tripod-operator [7].An alternative approach is to explicitly detect extended features, such as curves of maximal curvature [8], intersections of planar regions [9], or bitangent-curves [10], and match them between model and scene.
In our system, we use a modified version of Johnson's spin-image surface matching algorithm to perform initial pair-wise matching in the local registration phase [1,11,12].The algorithm is fast, matching two views in 1.5 s, does not require any explicit feature detection, and is robust to holes in the surfaces, which frequently occur when using range image data.
Pair-wise registration algorithms improve upon an initial relative pose estimate by minimizing an objective measure of registration error.The initial relative pose can be provided by a surface matching algorithm, by manual registration, or by the data acquisition system.The dominant method in this category is the iterative closest point (ICP) algorithm, which repeatedly updates the relative pose by minimizing the sum of squared distances between closest points on the two surfaces (point-to-point matching) [13].Chen and Medioni proposed a similar method in which the distance between points and tangent planes is minimized instead (point-to-plane matching) [14].Rusinkiewicz' survey of ICP variants provides an elegant taxonomy and unifying framework for comparing the numerous extensions to the basic algorithm [15].
In our system, we use pair-wise registration in the local registration phase to improve pair-wise matches.We have experimented with point-to-point and point-to-plane matching and found that point-to-point matching tends to prevent surfaces from sliding relative to one another, leading to slower convergence in many cases.Therefore, we use the point-to-plane matching method.In practice, we use Neugebauer's multi-view registration algorithm for pairwise registration (see below) [16].In the two view case, his algorithm essentially reduces to Chen and Medioni's [14].
Multi-view registration algorithms minimize registration error over an entire network of overlapping views, optimizing the absolute pose parameters of all views.These algorithms require an initial estimate of either the absolute or relative poses.In 3D digitization, multi-view registration is normally used as a final step to improve the overall model quality or as a method for converting from relative poses to absolute poses.A wide variety of Fig. 3. Hand-held modeling, a 3D digitization application.Holding the object before a laser scanner (left), we obtain 3D data from various viewpoints (center), and automatically construct a digital version of the original object (right).The challenge lies in the uncontrolled poses and arbitrary order of views.Fig. 2. A block diagram showing the two phases of our multi-view surface matching algorithm.The local phase takes an unordered set of input views and performs pair-wise surface matching, outputting a set of matches.The global phase searches this set of matches for a globally consistent solution, outputting the transforms that place the views in a common coordinate system (shown here with respect to view 1).multi-view registration algorithms have been proposed.One approach is to update all the absolute poses simultaneously.In this vein, Benjemaa and Schmitt derived a multi-view extension of the ICP algorithm, and Neugebauer developed a multi-view version of Chen and Medioni's pair-wise algorithm [16,17].Alternatively, the absolute poses can be updated sequentially.Bergevin et al. repeatedly perform pair-wise registration on pairs of overlapping views using a modified version of Chen and Medioni's algorithm [18].Pulli uses a similar approach based on the ICP algorithm [19].Another idea is to view the relative poses as constraints on the absolute poses.Lu and Milios derive uncertainty measures (covariance matrices) for the relative poses and solve for the absolute poses that minimize the Mahalanobis distance between the computed and measured relative poses [20].Stoddart and Hilton use the analogy of a mechanical system, in which corresponding points are attached by springs, to derive a set of force-based incremental motion equations that are equivalent to gradient descent [21].Eggert et al. also use a mechanical system analogy, but they include an inertia term and also update the correspondences over time [22].Goldberger uses a model-based approach based on the EM algorithm [23].At each iteration, every view is registered to the current model and then a new maximum likelihood model is created from the updated views.
Our system uses multi-view registration in the global registration phase.We have implemented Benjemaa and Schmitt's point-to-point matching algorithm as well as Neugebauer's point-to-plane matching algorithm.We found that the slower convergence of the pair-wise point-to-point method is exacerbated in the multi-view case, so we opt for Neugebauer's point-to-plane matching approach [16].
Multi-view surface matching algorithms fall into the final category of our taxonomy.To the best of our knowledge, our algorithm is the first in this category [24].However, since we use multi-view surface matching to automate the process of 3D digitization, it is reasonable to ask how 3D digitization is normally accomplished and to what degree is the process automated.Existing systems use one or more of the following techniques to register the data: a calibrated data acquisition system, manual registration/verification, or environmental modification.With calibrated data acquisition systems, the type of hardware used depends on the size of scene.For smaller objects, absolute poses can be obtained by mounting the sensor on a robot arm [26] or by keeping the sensor fixed and moving the object on a calibrated platform, such as a turntable [27].For larger scenes, such as the statues scanned in the Digital Michelangelo project, accurate measurement of camera pose becomes considerably more complex, requiring a custom-made gantry system [28].With manual registration, the user may specify corresponding feature points in pairs of range images, from which relative poses can be estimated [16].In some systems, corresponding feature points are automatically detected and then manually verified for correctness [29].Alternatively, the 3D data can be aligned directly through an interactive method [19].In more advanced approaches, a person indicates only which views to register (i.e.specifies the overlaps), and performs surface matching, manually verifying the results [9,30].If the motion between successive scans is small, a pair-wise registration algorithm (e.g.ICP) can be used instead, as was done with the Great Buddha digitization project [31] and with Rusinkiewicz' real-time modeling system [32].In the Pieta project, the environment was augmented with markers, which served as feature points to aid in manual registration [33].
Our algorithm offers several advantages over existing approaches.First, it does not require any specialized data acquisition system, which simplifies the hardware and reduces cost.Second, the algorithm can be applied to 3D data at any scale, so it is not necessary to design one system to model desktop-sized objects and a separate one for modeling buildings.Third, manual registration is timeconsuming and tedious-qualities that oppose the widespread acceptance of 3D digitization at the consumer level.Finally, our algorithm enables new applications, such as hand-held modeling, and greatly simplifies data collection.

The model graph
Before getting into the details of our algorithm, we introduce the model graph, a concept which simplifies the description of our algorithm and provides a visual interpretation of the registration process.A model graph is an undirected graph G that encodes the topological relationship between views (Fig. 4).It contains a node n i for each input view V i and an edge e i;j for each pair of overlapping views V i and V j : Associated with each node is an absolute pose T i and with each edge is a relative pose T i;j as well as additional information, such as registration quality.The relative pose between two connected views V i and V j can be computed by composing the relative poses along any path from V i to V j in G: A connected model graph specifies a complete model and a potential solution to the multi-view surface matching problem, since every view can be transformed into a common coordinate system by composing relative poses.If G contains several connected components, each component is called a partial model.A spanning tree of G is the minimum specification of a complete model.Additional edges will create cycles in G, which can lead to conflicts because composing transforms along different paths between two views may give different results.A model is pose consistent if the relative pose of two views is independent of the path in G used for the calculation.In practice, pose inconsistencies arise from the accumulation of small errors in relative pose estimates along a path.
We can now view the registration algorithms used in our system in terms of operations on a model graph G. Surface matching adds one or more edges between two nodes in G. Pair-wise registration updates a single edge (relative pose), while multi-view registration updates all the edges (relative poses) and nodes (absolute poses).Our multi-view surface matching algorithm uses pair-wise surface matching and registration to construct a model graph and then searches this graph for a complete model that is pose consistent and globally surface consistent.In Section 4, we define the term 'surface consistency' and compare several surface consistency measures.

Surface consistency
The multi-view surface matching problem would be greatly simplified if we could know with absolute certainty which matches from pair-wise surface matching were correct.This goal cannot be achieved just by looking at pairs of views because two data sets could have zero registration error but still be an incorrect match.The region that would indicate an incorrect match may be far from the overlapping area, and the error may be detectable only indirectly through other matches.To solve this problem, we must look at the consistency of the data at the global level as well.
Surface consistency is a measure of the degree to which the overlapping data from two (or more) surfaces could represent the same physical surface.For any consistency measure, we can define a classifier, which is a thresholded version of the measure.We use surface consistency measures in three ways: (1) to rank the results of pairwise surface matching; (2) as a classifier that filters out the worst pair-wise matches; and (3) as a basis for a global consistency classifier for verifying entire models.

Local surface consistency
We have implemented three local surface consistency measures: overlap distance, a general measure that applies to any pair of surfaces; and two measures based on visibility consistency that are tailored to surfaces derived from range images.The measures are defined as error measuressmaller values represent more consistent surfaces.The input surfaces are assumed to be represented in a common coordinate system (i.e. one view is already transformed by the relative pose).

Overlap distance
One way to judge the consistency of two surfaces is to directly measure the distance between the surfaces in overlapping regions.We begin with the following definition of overlap: A point, p, on surface S i overlaps surface S j if (1) the point, q; on S j closest to p is an interior (non-boundary) point of S j ; (2) the angle between the surface normals at p and q is less than a threshold, t u ; and (3) the Euclidean distance, D; between p and q is less than a threshold, t D : Given two surfaces represented as meshes, we can estimate the average overlap distance of surface S i with respect to S j : where w f is the average of the distances between the corners of face f on S i and the surface S j ; Aðf Þ is the surface area of f, and F O is the set of faces on S i for which all three corners overlap S j according to the overlap definition above.Partially overlapping faces are handled as special cases.
We also compute the proportion of S i that overlaps S j : Similarly, O D and O P can be computed for S j with respect to S i : Since larger overlapping proportions give a more stable estimate of overlap distance, we define our first local consistency measure to be the weighted average of the two non-symmetric distances: where O P;i;j is shorthand for O P ðS i ; S j Þ and similarly with O D;i;j ; O P;j;i ; and O D;j;i : The disadvantage of overlap distance is that it only takes into account the space close to the two surfaces.In some cases, obviously incorrect matches will have a small overlap distance simply because the overlapping regions have similar shapes.For example, in Fig. 6b, the overlapping region on the angel's head matches well because it is roughly spherical.

Visibility consistency
For range sensors with a single point of projection, we can develop more powerful measures that take advantage of the sensor's entire viewing volume by looking at the consistency of the two surfaces along the line of sight from each of the sensor viewpoints.We call this concept visibility consistency.For example, consider the surfaces in Fig. 5 viewed from the sensor position C 1 : For a correct registration, the two surfaces have similar range values wherever they overlap (Fig. 5a).For an incorrect registration, two types of visibility inconsistencies can arise.A free space violation (FSV) occurs when a region of S 2 blocks the visibility of S 1 from C 1 (Fig. 5b), while an occupied space violation (OSV) occurs when a region of S 2 is not observed by C 1 ; even though it should be (Fig. 5c).Free space violations are so named because the blocking surface violates the assumption that the space is clear along the line of sight from the sensor to the sensed surface.Similarly, OSV surfaces violate the assumption that the range sensor detects occupied space.Here, we focus on FSVs, but the potential of OSVs is discussed briefly in Section 8. Visibility consistency has been used previously in other 3D vision contexts, including hypothesis verification [34], surface registration [22], range shadow detection [35], and multiview integration [36,37].
We can detect FSVs with respect to sensor position C i by projecting a ray from the center of projection of C i through a point p on S i : If the ray passes through S j at a point q which is significantly closer to C i than p, then q is an inconsistent point.We must test whether q is significantly closer because even for correctly registered surfaces, p and q will not have precisely the same range.
We can efficiently implement FSV detection using two zbuffers [38] to construct synthetic range images (Fig. 6).To compute FSVs for surfaces S i and S j with respect to C i ; the surfaces are projected into separate z-buffers and converted into range images ðR i and R j Þ using the coordinate system and parameters of C i (e.g.focal length, viewing frustum).The range difference is then computed for each pixel xðkÞ where both range images are defined.
We have developed two local consistency measures based on the FSV concept.The first one, which we call the FSV odds, is a statistical measure based on Bayesian decision theory [39].Given the two possible hypotheses, H þ (correct match) and H2 (incorrect match), and the set of range difference measurements D ¼ {D i;j ð1Þ; …; D i;j ðKÞ}; we estimate the ratio Assuming samples of D are independent and taking the logarithm, we have which is independent of the data and can be dropped.An independent measure LðS j ; S i Þ can be computed with respect to sensor viewpoint C j : Frequently, an incorrect match will be detectable from only one viewpoint, so we conservatively combine LðS i ; S j Þ and LðS j ; S i Þ to form our second consistency measure, the FSV odds: The smaller the value of LðS i ; S j Þ; the more likely it is a correct match 2 .The corresponding FSV odds classifier is The distributions PrðD i;j ðkÞlH þ Þ and PrðD i;j ðkÞlH 2 Þ in Eq. ( 6) can be estimated from labeled training data.We use a set of hand-labeled matches obtained from exhaustive pair-wise surface matching of the views of a typical object.First, we compute the range differences for the set of correct matches (Fig. 7, left) and for the set of incorrect matches (Fig. 7, center).We then model PrðD i;j ðkÞlH þ Þ as a mixture of two Gaussians, one for outliers and one for inliers.The parameters are determined by maximum likelihood estimation.The process is repeated for the incorrect matches to estimate PrðD i;j ðkÞlH 2 Þ: Mixtures of two Gaussians are necessary because correct matches will contain some outliers, primarily due to small registration errors, and incorrect matches will contain inliers in the region that was matched during surface matching.
For pair-wise matches, the FSV odds is a good measure of surface consistency, but for non-adjacent views in a model graph, the accumulation of error when computing the relative transforms reduces the quality of this measure.For this situation, we use an alternative method for estimating surface consistency, the FSV fraction.To compute the FSV fraction, we apply a threshold, t SS ; to the range difference measurements Eq. ( 4) to classify the overlapping pixels into one of three categories (Fig. 6c We then compute the fraction of points that are FSVs, ignoring 'don't care' points (class X DC ): As with the FSV odds measure, we can perform the computation with respect to sensor viewpoint S j to get FðS j ; S i Þ: Conservatively combining the results gives our third consistency measure, the FSV fraction: The corresponding FSV fraction classifier is:

Comparison of local consistency measures
We compare the three consistency measures by evaluating their performance on the task of classifying matches from a test object.By varying the threshold for each classifier and computing the false positive and false negative rates, we can observe how each measure trades off between the two types of errors.The resulting ROC curves (Fig. 7, right) indicate that the two visibility consistency measures are an improvement over the overlap distance measure.This is because they can detect inconsistencies throughout the sensor's entire viewing volume.

Global surface consistency
Global surface consistency is the extension of local surface consistency to an entire model.A model is globally surface consistent if every pair of views is locally surface Fig. 6.An example of visibility consistency for an incorrect match.The synthetic range images show the classification of pixels according to Eq. ( 9).The large number of FSV pixels indicate an incorrect match.consistent according to the FSV fraction classifier (Eq.( 12)): where V C the set of connected (not necessarily adjacent) view pairs in G; and T i;j is the relative pose computed by composing transforms along a path between nodes n i and n j in G: The fact that global consistency is computed between non-adjacent views in G is important.These nonadjacent comparisons produce new, non-local consistency constraints, which makes the global consistency test much more powerful than local consistency tests.

Local registration phase
Now that we have defined the model graph and the surface consistency measures, we can fully explain our multi-view surface matching algorithm.The process will be demonstrated on the angel2 test object (Fig. 11d).In the local registration phase, we attempt to register all pairs of views using a surface matching algorithm.For small numbers of views (< 50 or less), this exhaustive registration strategy is reasonable.For larger scenes, the combinatorics make this approach infeasible, and view pairs must be selectively registered (Section 7.2).
In preparation for surface matching, the views are preprocessed in the following manner.The input range images are converted to triangular surface meshes by projecting into 3D coordinates and connecting adjacent range image pixels.Mesh faces within range shadows (which occur at occluding boundaries in the range image) are removed by thresholding the angle between the viewing direction and the surface normal.For computational efficiency, the meshes are simplified using Garland's quadric algorithm [40].The simplified meshes are used only for pair-wise surface matching; all other operations use the full resolution meshes.
The pair-wise surface matching algorithm registers two surfaces based on their shape.We treat this process as a black box that takes two meshes as input and outputs a list of relative pose estimates.Details can be found in Ref. [30].If the two views overlap, the algorithm often finds the correct relative pose, but it may fail for a number of data-dependent reasons (e.g.not enough overlap or insufficient complexity of the surfaces).Even if the views do not contain overlapping scene regions, the algorithm may nevertheless find a plausible, but incorrect, match.Furthermore, symmetries in the data may result in multiple matches between a single pair.The model graph of pair-wise matches for the angel2 data set is shown in Fig. 8a.For illustration, the matches have been hand-classified, but these labels are not known by our algorithm.Next, the alignment of each match is improved using pair-wise registration.Finally, we perform a local surface consistency test by applying the FSV odds classifier to the matches Eq. ( 8).We classify the matches using a conservative threshold chosen with the intention of eliminating obviously incorrect matches without removing any correct ones.The resulting model graph G LR (LR for local registration) is shown in Fig. 8b.

Global registration phase
The global registration phase uses the locally consistent pair-wise matches ðG LR Þ to construct a pose consistent and globally surface consistent model from which the absolute poses can be read directly.The connected sub-graphs of G LR represent the set of all possible model hypotheses for the given pair-wise matches.To succeed, the global registration must find a sub-graph containing only correct matches.Combinatorics prevent us from exhaustively searching the sub-graphs of G LR : If we restrict our search to acyclic subgraphs (i.e.spanning trees and their subgraphs), then the problem becomes simpler for two reasons.First, acyclic graphs are guaranteed to be pose consistent, since there is at most one path between any two views.Second, a spanning tree uses the minimal number of matches to specify a complete model, so the chances of including an incorrect match are reduced.In practice, even a single incorrect match results in a dramatically incorrect solution (Fig. 11h).This follows from the fact that our pair-wise registration algorithm converges to a local minimum in registration error over a fairly wide range of initial pose parameters.Therefore, the pair-wise matches are typically either approximately correct or quite wrong.Surprisingly, restricting ourselves to acyclic graphs does not limit the expressiveness of our representation.To see why this is so, consider a graph with cycles and a spanning tree of that graph, which will necessarily be missing some edges from the original graph.The relative pose for any omitted correct match can be inferred by traversing the path in the spanning tree that connects the two nodes.
The global registration can be posed as a mixed discrete and continuous optimization problem over the discrete space of acyclic sub-graphs of G LR and the associated continuous valued absolute pose parameters.We decompose the problem into two nested sub-problems: an inner continuous optimization over absolute poses for a fixed model graph and an outer discrete optimization over model graphs for fixed poses.For the discrete optimization, we sequentially construct a spanning tree from the edges in GLR using a modified version of Kruskal's minimum spanning tree algorithm [41].Using a spanning tree allows us to directly compute absolute pose estimates for initializing the continuous optimization step.For a fixed graph structure, the continuous optimization is just the multiview registration problem.Our implementation, based on Neugebauer's [16], minimizes the point-to-plane correspondence error.Correspondences are established between all overlapping view pairs, not just the edges from the current model graph.At the end of each step, the graph is checked for global surface consistency, which ensures the final solution will be surface consistent and reduces the chances that the algorithm will fall into a local minimum.
The pseudo-code for this algorithm is shown in Fig. 9. Initially, G represents N partial models, each containing one view (line 1).The edges of G LR are sorted by their FSV odds value and tested one at a time.In each iteration through the loop, the best untested edge from G LR is selected, and if it connects two components, a temporary model graph G 0 is formed, thereby joining two partial models (line 4).The alignment of the views in G 0 is then improved by multi-view registration (line 5).If the resulting partial model is globally surface consistent (line 6), the new edge is accepted, and G 0 becomes the starting point for the next iteration (line 7).
Eventually, the algorithm either finds a spanning tree of G LR ; resulting in a complete model, or the list of edges is exhausted, resulting in a set of partial models.Fig. 8 shows the model graph G at several stages.Once the views are registered, they are merged into a single surface using Curless' VRIP algorithm [42].The final model, corresponding to the graph in Fig. 8f, is shown in Fig. 10.
In addition to the full algorithm described above (full hereafter), we tested two simpler versions of our algorithm to analyze the effects of the continuous optimization and the global consistency check.The discrete_only algorithm omits the continuous optimization step (line 5), and the min_span algorithm skips the global consistency check (line 6) as well.The min_span algorithm, which finds the minimum spanning tree of G LR ; has the advantage that it is simple, fast, and always finds a solution, but the result may not be globally surface consistent.The global consistency test can be performed at the end as a verification, but it is not possible to correct the inconsistency.The discre-te_only algorithm integrates the global surface consistency check into min_span, effectively allowing a single step of backtracking.However, the buildup of small pairwise errors can lead to large discontinuities between some overlapping surfaces, which may cause a model to be globally inconsistent even though it contains only correct matches.By incorporating multi-view registration at each step, the errors are distributed over the entire model, allowing the full algorithm to find the correct solution in some cases where discrete_only fails (e.g.angel1).

Results
We tested our multi-view surface matching algorithms by digitizing a collection of ten test objects, ranging in size from 190 to 375 mm (Fig. 11).Using a Minolta Vivid 700 laser scanner, we obtained 15 to 20 views of each object, scanning with the hand-held data collection method described in the introduction.A black background and glove allow simple, automatic segmentation of the background by thresholding the intensity image.We compared the performance of the three global registration algorithms described in Section 6 (Table 1).The parameters used in  2).
The full algorithm found the correct model (i.e. a model containing only correct matches) in nine of the ten test cases.For two test sets (angel2 and letter y), min_span failed where discrete_only succeeded.This is because one of the most consistent matches was actually an incorrect match, and the discrete_only algorithm correctly detected the incorrect match using the global consistency test.For the angel1 test set, discre-te_only failed but full succeeded.In this case, the accumulation of pair-wise error prevented discre-te_only from merging the last two components into a consistent model.The components output by discre-te_only represented the left and right sides of the object.
In one case (angel3), none of the algorithms succeeded.This is because pair-wise surface matching did not find any correct matches between two clusters of views: one set representing the front and sides of the object and the other containing views of the back.By searching for a spanning tree of G LR containing only correct matches, we are assuming that such a tree exists.When this assumption is violated, the global registration process cannot succeed.Ideally, the algorithm should recognize that there is no solution and output a set of correct partial models.We could then pass the partial models back into the local registration phase, treating each partial model as a single view.The greater surface area of each partial model can result in matches that would not have been found in the initial matching phase.Another solution is to acquire more data sets that span the boundary region between the two clusters.

Registration accuracy
While our algorithm outputs models that are qualitatively correct, it is important to establish quantitative methods for evaluating an algorithm's success.We use a method proposed by Wheeler to measure the error in absolute poses [26].The maximum correspondence error (MCE) for surface S i is the maximum displacement of any point on S i from its ground truth position.Note that once the poses reach a certain level of accuracy, this measure actually evaluates the accuracy of the multi-view registration algorithm used within our algorithm.
Since we do not have ground truth poses for the 10 test objects in Fig. 11, we generated synthetic range images of digital objects, corrupting the synthetic range values with Gaussian noise (Fig. 12).The noise level (s ¼ 1 mm) and sensor parameters (e.g.focal length, image size, and field of view) were chosen to simulate the worst-case operation of the Vivid 700-scanning a dark-colored object from a distance of 1 m.The synthetic objects were scaled to a size of 200 mm.The MCE for the absolute poses of the object in Fig. 12 was less than 0.135 mm (0.068%) for all 32 input views.For the entire set of 16 automatically modeled synthetic objects, the MCE did not exceed 1.11 mm (0.56%) for any view.

Complexity
For models with approximately 20 views, the entire modeling process takes 6 to 8 min once data collection is complete, depending on the algorithm used in the global registration phase (Table 3).Three aspects of the process are potential bottlenecks when scaling to large numbers of views.First, the number of view pairs matched during pairwise surface matching is OðN 2 Þ; where N is the number of input views.Even for 19 views, the majority of processing time (4:13 min) is spent performing this operation.This issue can be addressed by using heuristics to selectively register views, exploiting information inherent in each view to sort the views based on the likelihood of a successful match or to partition them into groups that are likely to match with each other.For example, views containing similarly shaped regions should be attempted first.The global consistency test is another OðN 2 Þ operation because every pair of views in a given hypothesis is tested for consistency.However, only view pairs that are physically close to one another actually need to be tested.Geometric data structures can be used to quickly determine which views are close enough to interact.For example, interval trees can be used to compute the intersection of the viewing volume bounding boxes for all view pairs in time OðN Â log N þ sÞ; where s is the number of intersecting pairs [43], and only those that intersect need to be tested for consistency.Finally, multi-view registration involves solving a linear system with 6ðN 2 1Þ unknowns (the absolute pose parameters), an OðN 3 Þ operation.Fortunately, the system of equations is sparse (the sparseness is related to the number of non-overlapping view pairs), and a substantial  speedup could be achieved through sparse matrix techniques.

Generality
To test the generality of our method, we constructed models of some larger scale environments.These experiments also demonstrate alternative data collection techniques and modeling with a different sensor.Fig. 13a shows an interior model constructed from data collected by moving the Vivid scanner around the lab.Fig. 13b shows a terrain model constructed using data from a different sensor, the Zoller þ Fro ¨hlich (Z þ F) LARA 25200 laser scanner.For these models, different settings were used for some of the parameters in Table 2 (e.g.t ss and t D ).In our more recent work, we have eliminated the need to manually set such parameters.

Summary and future work
We have presented a method for automatically registering a set of 3D views of a scene.The procedure uses a combination of discrete and continuous optimization methods to construct a globally consistent solution from a set of pair-wise registration results.We defined three measures of surface consistency, one based on overlap distance and two based on visibility consistency.We showed that visibility consistency gives a more accurate prediction of match correctness because it takes into account the entire space between the sensor and the sensed surface.Using these visibility consistency measures, we tested three versions of our multi-view surface matching algorithm, demonstrating their utility by automatically constructing 3D models of a collection of test objects.We also showed that the algorithm accurately registers the input views and that the method can be applied to different data collection scenarios and sensors.
We have identified several aspects of our multi-view surface matching method to be further developed.One problem that can arise in the global registration phase occurs when an incorrect match is added to the graph and the resulting partial model is still globally consistent.Once the algorithm proceeds to the next iteration, there is no hope of finding the correct solution.This problem can be addressed in several ways.One solution is to use a more sophisticated search algorithm.For example, we could incorporate backtracking, turning it into a depth first search of the space of all spanning trees and their subgraphs, or we could use beam search to track multiple hypotheses at each iteration.A second solution is to consider different search heuristics.For example, we could bias the search towards matches that have the most overlap or that would overlap with the most views.A third approach is to use randomized graph search.We have investigated using a RANSAC algorithm, in which spanning trees are randomly sampled from G LR and then evaluated using the global consistency test.Unfortunately, depending on the number and arrangement of incorrect matches in G LR ; an unacceptably large number of trials may be required.Currently, we are experimenting with other stochastic methods, such as simulated annealing.The min_span algorithm could be used to generate a starting solution for such methods.
Fig. 11h shows an example of a model that contains a single incorrect match.Although obviously wrong, the model is actually consistent according to our global consistency test.This situation could be avoided with an enhanced consistency test that considers OSVs as well as FSVs.Detecting OSVs requires a more sophisticated sensor model than FSVs because surfaces may go undetected for a number of reasons (e.g. the surface is out of sensor range or the normal is too oblique to the viewing direction).
Finally, we must address the issue of scaling to large numbers of views.Although 20 to 50 views are sufficient for modeling small objects and environments, our algorithm's success at this task ensures that eventually we will want to model larger and more complex scenes.To accomplish this, we need to implement and evaluate selective registration and the other optimizations we proposed in Section 7.2.

Fig. 1 .
Fig. 1.A simple taxonomy showing the naming convention for the registration algorithms used in this paper.

Fig. 5 .
Fig. 5. Visibility consistency from the perspective of sensor position C 1 : (a) Consistent surfaces are close together wherever they overlap; (b) a free space violation (FSV) occurs when surface S 2 blocks the view of the observed surface S 1 (highlighted region); (c) an OSV occurs when a portion of S 2 is not observed from C 1 even though it is expected to be visible (highlighted region).

Fig. 7 .
Fig. 7.The distribution of range difference measurements D over a large set of correct matches (left) and incorrect matches (center) from a test object.The predicted distributions, mixtures of two Gaussians learned from a separate training set, are shown overlaid (thin black line).The ROC curves (right) compare the classification accuracy of the three consistency measures.

Fig. 8 .
Fig. 8. Model graphs from the local registration phase (a and b) and global registration phase (c -f) for the angel2 test object.Matches were handlabeled for illustration: thick (blue) edges are correct matches, and thin (red) edges are incorrect matches.(a) The model graph from exhaustive pair-wise registration; (b) after filtering the worst matches ðG LR Þ; (c) empty model graph ðG 0 Þ; (d) after 5 steps of the full algorithm; (e) after 15 steps; (f) the final model graph.

Fig. 9 .Fig. 10 .
Fig. 9. Pseudo-code for the full algorithm for the global registration phase.

Fig. 11 .
Fig. 11.Photographs of the 10 test objects (first and third columns) and the resulting 3D models (second and fourth columns).All of the objects were correctly reconstructed except angel3, which contains one error.

Table 1
Performance of the three algorithms on the test models.þ indicates a correct result (i.e. a complete model with no incorrect matches); £ indicates an incorrect result (i.e.partial models or a graph with incorrect matches) Object min_span discrete_only full

Fig. 12 .
Fig. 12.(a) An example input object used in the registration accuracy experiment.(b) Synthetic range images were corrupted with noise and then converted to surface meshes.(c) A visualization of the automatically modeled object.

Table 2
Parameter settings used in the automatic modeling experiments