Recent Results in Extensions to Simultaneous Localization and Mapping

. We report experimental results with bearings-only and range-only Si-multaneous Localization and Mapping (SLAM). In the former case, we give the initial results from a new method that extends optimal shape-from-motion to incorporate angular rate and linear acceleration data. In the latter case, we have formulated a version of the SLAM problem that presumes a moving sensor able to measure only range to landmarks in the environment. Experimental results for both are presented.


Introduction
A moving robot can be localized given a map of landmarks and an onboard sensor to sense the landmarks.Conversely, given accurate localization of the sensor, it is possible to build a map of the landmarks.A question that has recently intrigued our community is how well it is possible to do both (localize and map) if neither exist a priori.Research in Simultaneous Localization and Mapping (SLAM) makes one of two assumptions.First, and commonly, that the moving sensor can measure both range and bearing to landmarks in its vicinity [4], or, that the moving sensor can detect only bearing to or projections of the landmarks [1] [9].Interestingly enough, little attention has been paid to the complementary case, in which the moving sensor can only detect range to landmarks.
For the case in which projections are detected, we have extended optimal methods for shape-from-motion to incorporate angular rate and linear acceleration data from rate gyros and accelerometers, respectively.In addition to the obvious advantage of redundant measurements, the complimentary nature of visual and inertial data provides the following qualitative advantages over using visual or inertial data alone: • Visual information can correct the drift that results from integrating inertial sensor data • Inertial information can disambiguate the camera's motion when visual data is scarce or when the scene viewed by the camera is degenerate • Visual information can be used to determine the gravity direction required to correctly extract accelerations from accelerometer readings • Conversely, because the accelerometer readings are affected by gravity, two absolute components of the sensor's orientation can be determined if the contribution of gravity to the reading can be separated from the contribution of acceleration Here we present an algorithm for estimating camera motion using both visual and inertial data.
For range-only SLAM we have adapted the well-known estimation techniques of Kalman filtering, Markov methods, and Monte Carlo localization to solve the problem of robot localization from range-only measurements [3].All three of these methods estimate robot position as a distribution of probabilities over the space of possible robot positions.In the same work we presented an algorithm capable of solving SLAM in cases where approximate a priori estimates of robot and landmark locations exist.However, a solution to the range-only SLAM problem with no prior information remains to be found.The primary difficulty stems from the annular distribution of potential relative locations that results from a range only measurement Since the distribution is highly non-Gaussian, SLAM solutions based on Kalman filtering falter.In theory, Markov methods (probability grids) and Monte Carlo methods (particle filtering) have the flexibility to handle annular distributions.Unfortunately, the scaling properties of these methods severely limit the number of landmarks that can be mapped.
In truth, Markov and Monte Carlo methods have much more flexibility than we need; they can represent arbitrary distributions while we need only deal with well structured annular distributions.What is needed is a compact way to represent annular distributions together with a computationally efficient way of combining annular distributions with each other and with Gaussian distributions.In most cases, we expect the results of these combinations to be well approximated by mixtures of Gaussians so that standard techniques such as Kalman filtering or multiple hypothesis tracking could be applied to solve the remaining estimation problem.
Here we present new results in our ongoing efforts towards the solution to the general range-only SLAM problem.The key is a computationally efficient method of representing annular distributions and approximating their multiplication.This makes it possible for the robot to estimate locations of new landmarks that are completely unknown to start.

Optimal motion estimation using image and inertial data
In this section we present a method for estimating camera motion and sparse scene structure using image, rate gyro, and accelerometer measurements.The method is a batch algorithm that finds optimal estimates by minimizing a total error with respect to all of the unknown parameters simultaneously.
Error function.The error we minimize is E = E visual + E inertial , where and E visual specifies an image reprojection error given the six degree of freedom camera positions and three-dimensional point positions.This error function is similar to those used in bundle adjustment [11] and nonlinear shape-frommotion [10].In this error, the sum is over i and j, such that point j was observed in image i. x ij is the observed projection of point j in image i. ρ i and t i are the camera-to-world rotation Euler angles and camera-to-world translation, respectively, at the time of image i, and C ρi,ti is the world-tocamera transformation specified by ρ i and t i .X j is the world coordinate system location of point j, so that C ρi,ti (X j ) is location of point j in camera coordinate system i.π gives the image projection of a three-dimensional point specified in the camera coordinate system.In our current implementation, π may be either a conventional (i.e., perspective or orthographic) or an omnidirectional projection.The details of our omnidirectional projection model and its use with nonlinear shape-from-motion are given in [9] [8].
E inertial gives an error between the estimated positions and velocities and incremental positions and velocities predicted by the inertial measurements.Here, f is the number of images, and τ i is the time image i was captured.ρ i and t i are the camera rotation and translation, just as in the equation for E inertial above.v i gives the camera's linear velocity at time τ i , and g is the direction of gravity relative to the camera rotation at time τ 0 .I ρ , I v , and I t integrate the inertial observations to produce estimates of ρ i , v i , and t i from initial values ρ i−1 , v i−1 , and t i−1 , respectively.
We assume that all of the individual error functions D are Mahalanobis distances.The image error covariances may be assumed uniform and isotropic (e.g., with a standard deviation of one pixel) or determined by the tracking algorithm using image texture [2].In the experiments below, we have used isotropic densities as tuning parameters to specify relative confidences in the image, rate gyro, and accelerometer observations.Estimation.We use Levenberg-Marquardt [6] to minimize the combined error with respect to the camera rotations ρ i , translations t i , velocities v i , the gravity direction g, and the world point locations X j .The inverse of the Hessian matrix from the Levenberg-Marquardt method provides an estimate of the covariance of the recovered parameters.

Geometrically Inspired Methods for Non-Gaussian Distibutions
In [3] we presented a Kalman filter based algorithm capable of solving the range-only SLAM problem in the case where the approximate locations of the landmarks are known a priori.In this approach, the system state vector was defined to contain robot pose together with the positions of all of the landmarks.At each time step, range measurements were "linearized" about the current state estimate to produce and approximation of the relative positions between the robot and each of the landmarks.These approximations were then fed into a Kalman filter to improve the state estimate.Here we extend these results to allow a robot to initialize new landmarks that are completely unknown.The basic idea is to store the robot locations and measured ranges the first few times a landmark is encountered and then estimate of its position by intersecting circles on the plane.Once an estimate of a new landmark is produced, it is added to the Kalman filter where its estimate is then improved along with the estimates of the other (previously seen) landmarks.
The keys to this idea are to (1) find the intersection points of two circles and (2) estimate a distribution about the intersection points.Because it takes advantage of the special structure of the problem, the resulting approach is less computationally cumbersome and avoids the local extrema problems associated with standard batch optimization techniques.
Merging Annular Distributions.Given two circles with centers p 1 = [x1, y1] T and p 2 = [x2, y2] T , and radii r 1 and r 2 , respectively,1 their two points of intersection are given by: where where Next we approximate the distribution around each point with a Gaussian.At each intersection point, we obtain two Gaussian approximations, one for each annulus, following the procedure outlined in [3].We then merge the two approximations into a single Gaussian using standard Kalman gain formulas (see eg. Smith and Cheeseman [7]).This process is depicted graphically in Figure 1.Since there will usually be two intersection points, this process usually results in two Gaussian distributions, one for each intersection point.In the absence of any other information, we weight each Gaussian equally.The result is that this pair of Gaussians forms an approximation of the distribution that results from multiplying two annular distributions.To approximate the distribution that results from multiplying three annular distributions, we iterate pairwise intersection in the following manner.An example of this process is depicted in Figure 2. Let (p 1 , C 1 ) and (p 2 , C 2 ) be the means and covariance matrices associated with the two Gaussians used to approximate the multiplication of the first two annuli.Likewise, let (q 1 , D 1 ) and (q 2 , D 2 ) denote the means and covariances associated with multiplication of the first and third annuli.We then create four new distributions, (p 11 , C 11 ), (p 12 , C 12 ), (p 21 , C 21 ), and (p 22 , C 22 ), where (p ij , C ij ) are the mean and covariance matrix that result from merging (p i , C i ) and (q j , D j ).We assign a weight w ij to each of these four distributions that is inversely proportional to the distance between p i and q j .We then eliminate distributions whose weights are below some threshold and rescale the remaining weights.Now we rename the remaining distributions to be (p 1 , C 1 ) through (p n1 , C n1 ), where n 1 is the number of remaining distributions.We then introduce the Gaussians that result from merging the second and third annuli, and we name their means and covariances (q 1 , D 1 ) and (q 2 , D 2 ).We then follow a similar process of creating 2n 1 new distributions by merging each (p i , C i ) with each (q j , D j ), weighting each new distribution inversely with the distance between p i and q j , throwing out distributions whose weights are below a certain threshold, and renaming the remaining distributions (p 1 , C 1 ) through (p n2 , C n2 ), where n 2 is the number of remaining distributions.The resulting set of distributions together with their weights give an approximation of the distribution that results from multiplying three annular distributions.Often n 2 is equal to one, and the approximation is a single Gaussian.The procedure for approximating the multiplication of four or more annuli begins by finding the approximation for the first three then follows a similar pattern of finding pairwise approximations, merging, weighting, and truncating.
Estimating Landmark Location.Recall that our approach to estimating the location of a newly acquired landmark is to store the robot locations and range measurements the first few times the landmark is encountered.With this in mind, let n be the number of measurements to the new landmark and for i ∈ {1, 2, . . ., n} let ((x i , y i ), P i ) be the robot location estimate (mean and covariance matrix) and let r i be the range measurement the ith time that the new landmark is encountered.The best estimate for the location of the new landmark is given by the location of the peak of the distribution that results from multiplying the n annuli with centers (x i , y i ) and radii r i , i = 1, 2, . . ., n.To get an approximation of this estimate, we use the algorithm described above to approximate the distribution that results from multiplying the n annuli with a weighted collection of Gaussians and choose the estimate to be the mean of the Gaussian with the highest weight.
This approach requires estimates of the robot positions corresponding to the measurements to the new landmarks.In the experimental results presented in Section 3.2, we obtain these estimates by using odometry measurements together with measurements to other landmarks that have already been mapped.In the case where none of the landmark locations are known in advance, the method presented here can still be used to initialize new landmarks using only odometry data.

Optimal motion estimation using image and inertial data
In this section we present an experimental result from our algorithm for optimal motion estimation using both image and inertial measurements.In this experiment, we have mounted a conventional perspective camera augmented with rate gyros and an accelerometer on a five degree of freedom manipulator, which provides repeatable and known motions.More details on the configuration and observations are given below, along with an analysis of the resulting motion estimates.
Configuration.Our sensor rig consists of an industrial vision camera paired with a 6 mm lens, three orthogonally mounted rate gyros, and a three degree of freedom accelerometer.The camera exposure time is set to 1/200 second.The gyros and accelerometer measure motions of up to 150 degrees per second and 4 g, respectively.
Images were captured at 30 Hz using a conventional frame grabber.To remove the effects of interlacing, only one field was used from each image, producing 640 × 240 pixel images.Voltages from the gyros and the accelerometer were simultaneously captured at 200 Hz.
The camera intrinsic parameters (e.g., focal length and radial distortion) were calibrated using the method in [5].This calibration also accounts for the reduced geometry of our one-field images.The accelerometer voltageto-acceleration calibration was performed using a field calibration that accounts for non-orthogonality between the individual accelerometers.The individual gyro voltage-to-rate calibrations were determined using a turntable with a known rotational rate.The fixed gyro-to-camera and accelerometerto-camera rotations were assumed known from the mechanical specifications of the mount.
Observations.To perform experiments with known and repeatable motions, we mounted our rig on a robotic arm.In our experiment, the camera points toward the ceiling, and translates in x, y, and z through seven pre-specified points, for a total distance traveled of about 2.0 meters.Projected onto the (x, y) plane, these points are located on a square, and the camera moves on a curved path between points, producing a clover-like pattern in (x, y).The camera rotates through an angle of 270 degrees about the camera's optical axis during the course of the motion.
We tracked 23 features though a sequence of 152 images using the Lucas-Kanade algorithm.A few images from the sequence are shown in Figure 3.As shown in the figure, only 5 or 6 of the 23 features are typically visible in any one image.Because the sequence contains repetitive texture and large interframe motions, mistracking was common and was corrected manually.
Estimated motion.Some aspects of the estimated structure and motion are shown graphically in Figures 4-6. Figure 4 shows the (x, y) translation estimated using image measurements only (the erratic solid line) and both image and inertial measurements (the dash-dotted line).This figure also shows the motion estimate that results from integrating the inertial data only, assuming zero initial velocity and using the optimal gravity estimate (the diverging dashed line).Random errors in the measured accelerations and a small error in the estimated gravity cause this path to diverge almost immediately from the correct path, and accumulated error in the integrated velocity soon causes gross errors in the motion estimate.
Figure 5 shows the estimated error covariances for the (x, y) translations for every fifteenth image of the sequence.The covariances resulting from using image measurements only are shown as the large dotted ellipses, which show the one standard deviation error boundaries.The error covariances resulting from using both image and inertial measurements are shown as the solid ellipses, which show five standard deviation error boundaries.To provide a direct comparison, the covariances for both cases are estimated at the solution found using both image and inertial measurements.This solution, shown as a dash-dotted line, is the same as in Figure 2.
Similarly, Figure 6 shows the estimated error covariances for the (x, y) point locations.As in Figure 5, the estimated error covariances were evaluated at the solution found using both image and inertial measurements, which is shown as a dash-dotted line for reference.

Initializing Unknown Landmarks
In this section we present the results of a range-only tracking experiment during which a new, completely unknown landmark is initialized and estimated.
Experimental Configuration.This experiment employed an electric allterrain vehicle (ATV) equipped with an inertial navigation system (INS) and a range finding radio-frequency identification (RFID) system.The INS system consists of a fiber optic gyro to provide the vehicle's angular velocity together with an odometer to measure distance traveled.Outputs from the gyro and odometer are integrated to give a dead-reckoning estimate of position.
In the system used, a cell controller attached to an antenna continually sends queries out into the world.When the query is received by a credit-cardsized radio transponder (tag), the tag responds with its unique identification number.When the tag response is received back at the antenna, the cell controller uses round trip time of flight to estimate the range between the antenna and tag.In our experiment, a cell controller and a collection of antennas configured to provide a 360 degree coverage pattern are mounted onboard the ATV.Ten tags, which serve as landmarks for range-only tracking and SLAM, are distributed over a planar, unobstructed environment.The layout of the tags is depicted in Figure 8.
Tracking Results. Figure 7 shows the results of a tracking experiment where range-only measurements are used to correct for errors in the INS dead-reckoning position estimate.At the beginning of the experiment, the Range data is received somewhat infrequently.During the 225 second experiment, a total of 107 range measurements were received from the nine tags.This explains the erratic nature of the improved estimate; there are long periods of time during which no range measurements are received and hence the estimate error grows.When a measurement is received, the resulting correction can appear drastic Mapping Results.During the tracking experiment described above, the position of a tenth, completely unknown tag was initially estimated using the algorithm described in Section 2.2.Once an initial estimate was produced, it was improved using the range-only SLAM algorithm presented in [3].The results are shown in Figure 7.The plot shows both the initial estimate generated using circle intersection (solid ellipse) and the final estimate improved by application of Kalman filter based SLAM (dotted ellipse).Fig. 6.The estimated error covariances of the (x, y) point positions.As in Figure 5, the one standard deviation boundaries that result from using only the image measurements and the the five standard deviation boundaries that result from using both image and inertial measurements are shown as dotted and solid ellipses, respectively.

Fig. 1 .
Fig. 1.Approximating intersections of annular distribution as Gaussian.First, a Gaussian approximation is determined for each annulus about the intersection point.Then the two individual approximations are merged into one.

Fig. 2 .
Fig.2.Gaussian approximation for three intersecting annuli.In (a) Gaussian approximations resulting from the intersection of the first and second annuli (top) and the first and third annuli (bottom) are combined to form the distributions in (b, bottom).All possible combinations of individual Gaussians are considered and unlikely combinations are thrown away.In this example two Gaussians remain after the first step, but there could be as many as four or as few as one.In (b) the Gaussian approximations resulting from the intersection of second and third annuli are combined with the result of the first step.The final result, shown in (c), is a single Gaussian for this example.

Fig. 3 .
Fig. 3. Images 16, 26, 36, and 46 from the test sequence are shown clockwise from the upper left.As described in section 3.1, the images are one field of an interlaced image, so their height is half that of the full image.

Fig. 4 .
Fig.4.The (x, y) camera translations estimated using only image measurements, only inertial measurements, and both image and inertial measurements are shown as the erratic solid path, smooth diverging dashed path, and dash-dotted path, respectively.Seven known points on the camera's actual path are shown as squares.

Fig. 5 .
Fig.5.The estimated error covariances of the (x, y) camera translation for every fifteenth image of the sequence.The one standard deviation boundaries that result from using only image measurements and the five standard deviation boundaries that result from using both image and inertial measurements are shown as the dotted and solid ellipses, respectively.

Fig. 7 .
Fig. 7. Actual and estimated robot trajectories for a portion of the range-only tracking and SLAM experiment.In both figures, the actual robot trajectory (ground truth) is plotted with a dotted line.The figure on the left shows the estimate generated using dead-reckoning and INS sensors.The Kalman filter tracking estimate that fuses INS output and radio tag range measurements is shown in the figure on the left.

Fig. 8 .
Fig. 8. Initial and final estimates of unknown landmark.The diamond and associated ellipse (solid line) represent the estimate and covariance resulting from the batch initialization algorithm described in Section 2.2.The circle and associated ellipse (dotted line) represent the final estimate, improved by Kalman filter SLAM algorithm, at the end of the experiment.The squares denote the positions of the previously mapped landmarks.