A hierarchical object based representation for simultaneous localization and mapping

Accomplishing simultaneous localization and mapping (SLAM) in very large city environments is a great challenge because of theoretical and practical issues on computational complexity, dynamic environment, representation and data association. In this paper, we describe practical algorithms for dealing with the representation issues. Feature-based, grid-based and direct methods are integrated into the framework of the hierarchical object based representation. The sampling and correlation based range image matching algorithm is developed to tackle the problem arising from uncertain, sparse and featureless data in outdoor environments. Experimental results of a 800 meter /spl times/ 600 meter neighborhood demonstrate the feasibility of city-sized SLAM.


I. INTRODUCTION
Simultaneous localization and mapping (SLAM) simultaneously estimates locations of newly perceived landmarks and the location of the robot itself while incrementally building a map.Since Smith, Self and-Cheeseman first introduced the simultaneous localization and mapping (SLAM) problem [l], the SLAM problem has attracted immense attention in the mobile robotics literature and the web site of the 2002 SLAM summer school 121 provides a comprehensive coverage of the key topics and state of the art in SLAM.This paper is concerned with the problem of how a robot such as the Navlabll vehicle (see Fig. 1) can accomplish SLAM in very large urban environments.
For accomplishing this task, there are four key issues: dynamic environment, computational complexity, representation and data association.Regarding the dyamic envimnmeia issue, in 131 and [4], we presented a solution, SLAM with detection and tracking of moving objects (DATMO), and demonstrated that it is feasible to solve SLAM with DATMO from a ground vehicle at high speeds.Hence, in this paper, we will not discuss this issue .furtherand will assume that measurements associated with moving objects are filtered out of the SLAM process.With respect to the cornpurafional complexity issue, it is a key bottleneck of the SLAM problem because the Kalman filter solution explicitly represents correlations of all pairs among the robot and stationary objects.Both the computation time and memoly requirement scale quadratically with the number of stationary objects in the map.This computational burden restricts applications to those in which the map can have no more than a few hundred stationary objects.Recently, Camegie Mellon University Doha, Qatar Email cet@cs.cmu.edu this issue has been subject to intense research in the SLAM literature.Approaches using approximate inference, using exact inference on tractable approximations of the true model, and using approximate inference on an approximate model have been proposed.See [51 for an excellent comparison of these techniques.In this paper, we will take advantage of these promising approaches and focus on the representation problem.
Even with the advanced algorithms to deal with computational complexity, mostly the SLAM applications are still limited to indoor environments or specific environments and conditions because of significant issues in defining environment representation and identifying an appropriate methodology for fusing data in this representation.For instance, feature-based approaches have an elegant solution by using the Kalman filter or the information filter, but it is difficult to exlract features robustly and correctly in outdoor environments.Grid-based approaches do not need to extract features.But they do not pmvide any direct means to estimate and propagate uncertainty and they do not scale well in very large environments.
In this paper, we provide a comparison of the main paradigms for representation in terms of uncertainty management, sensor characteristics, environment representability, data compression and loopclosing mechanism.For overcoming the limitations of these representation methods, we present the hierarchical object based approach to integrate the direct methods.the grid-based methods and the feature-based methods.When data is uncertain, sparse, and featureless, the pose estimate form the direct methods such as the iterated closed point (ICP) algorithm 161 may not he correct and the distribution of the pose estimate may not be described properly.We describe the sampling and correlation based range image matching (SCRIM) algorithm to tackle these issues.
With respect to the data association issues, based on the the hierarchical object based representation, we develop practical algorithms for robustly detecting loops in very large scale urban environments without access to independent position information.Because this topic is beyond the scope intended by this paper, see [E] for the details.
The described algorithms for solving the representation issues are verified using dam collected from the Navlahll vehicle.The experimental results of a 800 meter x 600 meter neighborhood demonstrates the feasibility of citysized SLAM. Figure 2 shows an aerial photo of this neighhorhood in which the dark (blue) line indicates the Navlahl 1 trajectory.For most indoor applications, lines, circles, comers and other simple geometrical features are rich and easy to detect.But for outdoor applications, extracting features robustly and correctly is extremely diflicult because outdoor environments contain many different kinds of objects such as hushes, trees, or curvy objects whose shapes are hard to define.In these kinds of environments, whenever a feature is extracted an error from feature extraction will be produced because of wrong predefined features.

B. Grid-based rnerhods
Grid-based methods use a cellular representation called Occupancy Grid5 or Evidence Grids.Mapping is accomplished by using a Bayesian scheme, and localization can be accomplished using correlation of a sensor scan with the grid map [14l.
In terms of sensor characrerisrics and emhnmenr represenrubilit).. grid-hased approaches are more advanced than feature-based approaches.Grid-maps can represent any kinds of environments and the quality of the map can be adjusted by adapting the resolution of grids.Grid-based approaches are specially suitable for noisy sensors such as stereo camera, sonar and radar in which features are hard to define and extract from highly uncertain and uninformative measurements.
Nevertheless, grid-based approaches do not provide a mechanism for loop closing.Recall that correlation between the robot and landmarks is explicitly managed by the covariance matrix or the information matrix in the feature-based approaches.Correlation between the robot and landmarks is implicitly embedded in Occupancy Grids.How to retrieve correlation from Occupancy Grids is an open question.Given that a loop is correctly detected, loop closing can not be done with the existing grids.
Additional computation power is needed to run consistent pose estimation algorithms such as [I51 and the previous raw scans have to be used to generate a new global consistent map.

C. Direcf methods
Direct methods represent the physical environment using raw data points without extracting predefined features.
Localization can he done by using range image registration algorithms from the computer vision literature.For instance, the ICP algorithm is a widely used direct method; many variants have k e n proposed based on the basic ICP concept [161.However, a good initial prediction of the transformation between scans is required because of its heuristic assumption for data association.
The map is represented as a list of raw scans.Because there is overlap between scans, memory requirement for storing the map can he reduced by the integration (me%ing) process such as 1171.Just as with the grid-based approaches, when loops are detected, additional computation power is needed to run consistent pose estimation algorithms and the previous raw scans are used to generate a global consistent map.
In terms of uncerlaint).matmgement and sensor characteristics, very little work addresses how to quantify the uncertainty of the transformation estimate from registration process.Uncertainty arises mainly from outliers, wrong correspondences, and measurement noises.Without taking measurement noise into account, several methods to estimate the covariance matrix of the pose estimate were proposed such as [IS] and 1191.
Compared to indoor applications, the distances between objects and sensors in outdoor environments are usually much longer, which make measurements more uncertain and sparse.By assuming measurement noise is Gaussian, Pennec and Thirion used the extended Kalman filter to estimate both the rigid transformation and its covariance ma& in 1201.But their approach is very sensitive to correspondence errors and the assumption that the uncertainty of the pose estimate from registration processes can be modelled by Gaussian distributions is not always valid.

D. Coinparison
To summarize, we show the comparison of different representations in Table I.With regard to uncertainty management and loop closing mechanism, feature-based approaches have an elegant means.Regarding sensor characteristics, grid-based approaches are the easiest to implement and the most suitable for imprecise sensors such as sonar and radar.Respecting environment representahility, feature-based approaches are limited to indoor or structured environments in which features are easy to define and extract.

E. Hierarchical Object Based Representafioa
Because none of these three main paradigms is sufficient for large, outdoor environments, we present a hierarchical object based representation to integrate these paradigms and to overcome their disadvantages.
In outdoor or urban environments, features are extremely difficult to define and extract because both stationary and moving objects do not have specific sizes and shapes.Therefore, instead of using an ad hoc approach to define features in specific environments or for specific objects, free-form objects are used.
At the preprocessing stage, scans are grouped into .segmentsusing a simple distance criterion.The segments over different time frames are integrated into objects after localization and mapping processes.Registration of scan segments over different time frames is done by using the direct method, namely the ICP algorithm.Because range images are sparser and more uncertain in outdoor applications than indoor applications, the pose estimation and the corresponding distribution from the ICP algorithm are not reliable.For dealing with the sparse data issues, a sampling-based approach is used to estimate the uncertainty from correspondence errors.For dealing with the uncertain data issues, a correlation-based approach is used with the grid-based method for estimating the uncertainty from measurement noise along with the sampling-based approach.For loop closing in large environments, the origins of the object coordinate system are used as features with the mechanism of the feature-based approaches.
Our approach is hierarchical since these three main representation paradigms are used on different levels.In this rest sections, we will demonstrate that city-sized SLAM is feasible by using the hierarchical object based approach where SLAM is accomplished /oca//j using direct and grid-based approaches and g/oba//y using feature-based approaches.

OUTDOOR DATA
This section describes the difficulties of processing outdoor data.

A. Sparse arid Featureless Data
Compared to indoor applications, the distances between objects and sensors in outdoor environments are usually much longer, which make measurements more uncertain and not as dense.Sparse data causes problems of corre- spondence finding, which directly affect the accuracy of direct methods.In the computer vision and indoor SLAM literature, the assumption that corresponding points present the same physical point is valid because data is dense.If a point-point metric is used in the ICP algorithm, oneto-one correspondence will not be guaranteed with sparse data, which will result in decreasing the accuracy of transformation estimation and slower convergence.Research on the ICP algorithms suggests that minimizing distances between points and tangent planes can converge faster.But because of sparse data and irregular surfaces in outdoor environments, the secondary information derived from raw data such as surface normal can he unreliable and too sensitive.
The other issue is featureless data, which causes correspondence ambiguity as well, We illustrate this correspondence ambiguity issue with an example.Fig. 3 shows two scans, A and B, from a static environment and the segmentation results.In this example, we assume that motion measurement is unavailable and the initial guess of the relative transformation is zero.Fig. 4 shows the registration results using the ICP algorithm in which range images A and B are aligned using the same initial relative transformation guess but using different scan segments: one is matching with only segment 1 of scan A and segment 1 of scan B; b e other is matching with the whole scans of A and B. Figure 4 shows the registration results.It seems that the ICP algorithm provides satisfactory results in both cases and it is hard to quantify which result is bener.However, by coniparing the results with the whole scans in Figure 5, it is easy to justify that registration using only scan segment 1 of A and B provides a local minimum solution instead of the global one because of featureless data.

B. Uncertain Data
It is well known that several important physical phenomena such as the material properties of an object, the sensor incidence angle, and environmental conditions affect l (4 (h)  According to the manual of SICK laser scanners, the spot spacing of SICK LMS 211/221/291 is smaller than the spot diameter for an angular resolution of 0.5 degree.This means that footprints of consecutive measurements overlap each other.The photo in Fig. 6 c&en from an infrared camera shows this phenomenon.A red rectangle indicates a footprint of one measurement point.
With regard to range measurement error, we conservatively assume the emnr as 1% of the range measurement because of outdoor physical phenomena.The uncertainty of each measurement point zf in the polar coordinate system is described as: The uncertainty c m be described in the Cartesian coordinate system.Fig. 7 shows the SICK LMS 211/221/291 noise model.
In most indoor applications, it is assumed that a horizontal range scan is a collection of range measurements taken from a single robot position.When the robot is moving at high speeds, this assumption is invalid.We use the rotating rate of the scanning device and the velocity of the robot to correct the errors from this asrumption.

Iv. THE SAMI'I.ING AND CORREI.ATION BASED RANGE
IMAGE MATCHING AI.GORITHM In this section, we present the sampling and correlation based range image matching (SCRIM) algorithm for taking correspondence errors and measurement noise into account.

A. The Sampling-based Approach
Because of sparse and feururelers data issues, precisely estimating the relative vansformation and its corresponding distribution is difficult and the ambiguity is hard to avoid in practice.However, as long as the ambiguity is modelled correctly, this ambiguity can be reduced properly when more information or constraints are available.If the distribution does not describe the situation properly, data fusion can not be done correctly even if the incoming measurements contain rich information or constraints to disambiguate the estimates.Therefore.although more computational power is needed, a sampling-based approach is applied to deal with the issues of correspondence finding ambiguity.
Instead of using only one initial relative transformation guess, the registration process is run A' times with randomly generated initial relative transformations.Figure 8 shows the sampling-based registration of scan segment 1 in the previous example.100 randomly generated initial relative transformation samples are shown in the left figure and the corresponding registration results are shown in the right figure.Figure 8 shows that one axis of translation is more uncertain than the other translation axis and the rotation axis.Figure 9 shows the corresponding sample means and covariances using different numbers of samples.Tbe covariance estimates from the sampling-based approach describe the distribution correctly.

B. The Correlatiorr-based Approach
Because the sampling-based approach does not handle the measurement noise issues, the grid-based method [ 101 and the correlation-based method 1141 are applied and integrated with the sampling-based approach for taking measurement noise into account.
First, measurement points and their corresponding distributions are transformed into occupancy grids using the SICK noise model.Let ga be an object-grid built using the measurement A and g,"Y be the occupancy of a grid cell at (x;y).The grid-based approach decomposes the problem of estimating the posterior probability p(g I A) into a collection of one-dimensional estimation problems, p(grg 1 A).A common approach is to represent the posterior probability using log-odds ratios: (3) =Y Because the posterior probability is represented using log-odds ratios, multiplication of probabilities can be done using additions.In the previous section, the sampling-based approach treats the samples equally.Now the samples are weighted with their normalized correlation responses.Figure 11 shows the normalized correlation responses.

v. LOCAL MAPPING USING GRID-RASED APPROACHES
Since feature extraction is difficult and problematic in outdoor environments, we apply grid-based approaches for building the map.However, the grid-based approaches need extra computation for loop-closing and all raw scans have to be used to generate a new global consistent map, which is not practical for online citysized mapping.Therefore, the grid-map is only built locally.
After localizing the robot using the sampling and correlation based range image matching algorithm, the new Practically, there are two requirements for selecting the size and resolution of grid maps: one is that a grid map should not contain loops, and the orber is that the quality of the grid map should be maintained at a reasonable level.For solving the example in Fig. 2, the width and length of the grid map are set as 160 meters and 200 meters respectively, and the resolution of the grid map is set at 0.2 meter.When the robot arrives within 40 meter of the boundary of the grid map, a new grid map is initialized.
The global pose of the map and the corresponding distribution is computed according to the robot's global pose and the distribution.Figure 12 shows the boundaries of the grid maps generated along the trajectory using the described parameters.Figure 13 shows the details of the grid maps, which contain information from both stationary objecn and moving objects.The details of dealing with moving objects are addressed in [4].
, measurement is integrated into the grid map.

VI. GLOBAL MAPPING USING FEATURE-BASED APPROACHES
The first step to solve the loop-closing problem is to robustly defect loops or recognize the pre-visited meas.It is called the revisifbig problem 1211.Figure 14 shows that the robot entered the explored area.Because of accumulated pose e n o n (see Fig. 15(a), temporary stationary objects, occlusion, and unmodelled uncertainty, the current grid map is not consistent with the pre-built map.In this section we assume that loops are correctly detected.The issues and solutions of the revisiting problem is addressed in [SI.
For closing loops in real time, feature-based approaches are applied.Because the occupancy grid approach is used for local mapping, we have to develop a method to transform or decompose the occupancy grid map into stable regions (features) and a covariance matrix containing the correlation among the robot and the regions.Unfortunately, this is still an open question.Therefore, instead of decomposing the grid maps, we treat each grid map as a three degree-of-freedom feature directly.Fig. 15(b) shows the raw scans fram the SICK LMS221 scamer in which there are about 36,500 scans.Fig. 16 shows the result using the feature-based EKF algorithm for loop-closing where information from moving objects is filtered out .
The covariance matrix for closing this loop only contains 14 three degree-of-freedom features.
Since we set the whole grid maps as features in the feature-based approaches for loop-closing, the uncertainly inside the grid maps is not updated with the constraints from detected loops.Although Figure 16 shows a satisfying result, the coherence of the overlay between grid maps is not guaranteed.Practically, the inconsistency between the grid-maps will not effect the robot's ability to perform tasks.Local navigation can be done with the current built grid map which contains the most recent information about the surrounding environment.Global path planning can be done with the global consistent map from feature-based  approaches in a topological sense.In addition, the quality of the global map can be improved by using smaller grid maps to smooth out the inconsistency between grid maps.
At the same time, the grid-maps should be big enough to have high object saliency scores in order to reliably solve the revisiting problem.

VII. CONCI.USlON
In this paper, we compared different representations and presented the hierarchical object based representation that integrates direct, grid-based and feature-based approaches.The sampling and correlation based range image matching algorithm is used for dealing with the problems arising from uncertain, sparse and featureless data in outdoor environments.The experimental results using data collected from the Navlabll vehicle demonsvared the feasibility of city-sized SLAM with the use of the described algorithms.

Fig. 5 .
Fig. 5. Kegismtion TCSYIIS of I'ig. 4 a ~e shown with the whale scans.Lek regismlion using s e p c n i I of scan A and segmrm I of scan B. Righl: regismlion usin: the whole scans of A and B.

Figure 10
Figure 10(a) and Figure 10(b) show the corresponding occupancy grids of the segment 1 of scan A and scan B. After the grid maps 1, and I* are built, correlation of 1, are la is used to evaluate how strong the grid-maps are related.The correlation is computed as: P ( A ~~) P ( B = ~)

Fig. 13 .
Fig.13.Grid Map dcrails.C r q denom areas which arc "01 occupied hy h t h morinp ohjeas and stalionary objeca.whirer Illon gray denotes h e areas which are likely 10 k occupied by moving objects, and darker rhon g r q dcnotes lhc areas rhich are likely to be occupied by slationary objacls.

. Feature-bused methods
A