A multiple-baseline stereo

A stereo matching method is presented which uses multiple stereo pairs with various baselines to obtain precise depth estimates without suffering from ambiguity. The stereo matching method uses multiple stereo pairs with different baselines generated by a lateral displacement of a camera. Matching is performed by computing the sum of squared-difference (SSD) values. The SSD functions for individual stereo pairs are represented with respect to the inverse depth (rather than the disparity, as is usually done), and then are simply added to produce the sum of SSDs. This resulting function is called the SSSD-in-inverse-depth. The authors define a stereo algorithm, based on the SSSD-in-inverse-depth and then present a mathematical analysis to show how the algorithm can remove ambiguity and increase precision. Experimental results for stereo images are presented to demonstrate the effectiveness of the algorithm.<<ETX>>


Introduction
Stereo is a useful technique for obtaining 3-D information from 2-D images in computer vision.In stereo matching, we measure the disparity d, which is the distance between the corresponding points of left and right images.The disparity d is related t o the depth z by 'This research was supported by the Defense Advanced Research Projects Agency (DOD) and monitored by the Avionics Laboratory, Air Force Wright Aeronautical Laboratories, Aeronautical Systems Division (AFSC), Wright-Patterson AFB, Ohio 45433-6543 under Contract F33615-87-C-1499, ARPA Order No. 4976, Amendment 20.The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of DARPA or the U.S. government.
tThis research was performed while the first author was with Carnegie Mellon University.
Takeo Kanade School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA where B and F are baseline and focal length, respectively.This equation indicates that for the same depth the disparity is proportional to the baseline, or that the baseline length B acts as a magnification factor in measuring d in order to obtain z.That is, the estimated depth is more precise if we set the two cameras farther apart from each other, which means a longer baseline.A longer baseline, however, poses its own problem.Because a longer disparity range must be searched, matching is more difficult and thus there is a greater possibility of a false match.So there is a trade-off between precision and accuracy (correctness) in matching.
One of the most common methods t o deal with the problem is a coarse-to-fine control strategy Matching is done at a low resolution to reduce false matches and then the result is used to limit the search range of matching at a high resolution, where more precise disparity measurements are calculated.Using a coarse resolution, however, does not always remove false matches.This is especially true when there is inherent ambiguity in matching, such as a repeated pattern over a large part of the scene (eg., a scene of a picket fence).Another approach to remove false matches and to increase precision is to use multiple images, especially a sequence of densely sampled images along a camera path [BBM87, Yam88, MSK891.A short baseline between a pair of consecutive images makes the matching or tracking of features easy, while the structure imposed by the camera motion allows integration of the possibly noisy individual measurements into a precise estimate.The integration has been performed either by exploiting constraints on the EPI [BBM87, Yam881 or by a sequential Kalman filtering technique [MSK89, Hee891.
The stereo matching method presented in this paper belongs to the second approach: use of multiple images with different baselines obtained by a lateral displacement of a camera.The matching technique, however, is based on the idea that global mismatches can be reduced by adding the sum of squared-difference (SSD) values from multiple stereo pairs, an idea first exploited by JPL's three-camera stereo system for outdoor navigation [Wi187].That is, the SSD values are computed first for each pair of stereo images.We represent the SSD values with respect to the inverse depth 5 (rather than the disparity d, as is usually done).The resulting SSD functions from all stereo pairs are added together to produce the sum of SSDs, which we call SSSD-in-inverse-depth.We show that the SSSD-in-inversedepth function exhibits a unique and clear minimum at the correct matching position even when the underlying intensity patterns of the scene include ambiguities or repetitive patterns.An advantage of this technique is that we can eliminate false matches and increase precision without any search or sequential filtering.
In the next section we present the method mathemati- cally and show how ambiguity can be removed and precision increased by the method.Section 3 provides a few experimental results with real stereo images to demonstrate the effectiveness of the algorithm.

M a t h e m a t i c a l Analysis
The essence of stereo matching is, given a point in one image, to find the most similar point in another image.The sum of squared differences (SSD) of the intensity values (or values of preprocessed images, such as bandpass filtered images) over a window is the simplest and most effective criterion for matching.In this section, we define the sum of SSD with respect t o the inverse depth (SSSD-in-inverse-depth) for multiple-baseline stereo, and mathematically show its advantage in removing ambiguity and increasing precision.
For this analysis, we use 1-D stereo intensity signals, but the extension to two dimensional images is straightforward.

S S D Function
Suppose that we have camera positions PO, P I , . . . (3)

no(z),ni(z) N N ( O , d ) .
(4) assuming constant distance near 2 and independent Gaussian white noise such that The SSD value ed(;) over a window W at a pixel position z of image fo(z) for the candidate disparity d(;) is defined as ed(;)(z,d(i)) where the cjCw means summation over the window.The ed(;)(z,d(;)) is determined as the estimate of the disparity a t 2. Since the SSD measure- ment e d ( ; ) ( z , d ( ; ) ) is a random variable, we will compute its expected value in order t o analyze its behavior: where N , is the number of the points within the window.For the rest of the paper, E[] denotes the expected value of a random variable.In deriving the above equation, we have assumed that dr(i) is constant over the window.Equation (6) says that naturally the SSD function e d ( i ) ( z , d ( j ) ) is ezpected to take a minimum when d(;) = dr(;), i.e., at the right disparity.
Let us examine how the SSD function ed(;)(z,d(;)) behaves when there is ambiguity in the underlying intensity function.Suppose that the intensity signal f(z) has the same pattern around pixel positions and z + a, where a # 0 is a constant.Then, from equation ( 6) This means that ambiguity is expected in matchingin terms of positions of minimum SSD values.Moreover, the false match at d,(;) + a appears in exactly the same way for all i ; it is separated from the correct match by a for all the stereo pairs.Using multiple baselines does not help to disambiguate.

SSD w i t h respect t o Inverse D e p t h
Now, let us introduce the inverse depth C such that where Cr and C are the real and the candidate inverse depth, respectively.Substituting equation ( 11) into ( 5 ) , we have the SSD with respect to the inverse depth, In the next three subsections, we will analyze the characteristics of these evaluation functions to see how ambiguity is removed and precision is improved.

Elimination of Ambiguity (1)
As before, suppose the underlying intensity pattern f(z) has the same pattern around z and + a (equation (7)).
Then, according to equation (13), we have We still have an ambiguity; a minimum is expected at a false inverse depth (r = Cr + fp.However, an important point t o be observed here is that this minimum for the false inverse depth Cr changes its position as the baseline Bi changes, while the minimum for the correct inverse depth Cr does not.This is the property that the new evaluation function, the SSSD-in-inverse-depth (14), exploits to eliminate the ambiguity.For example, suppose we use two baselines B1 and Bz (B1 # Bz).From equation (15) and E[e((z)] with baselines B1 and Bz are shown in figures 2 (d) and (e), respectively (the plot is normalized such that BIF = 1).Note that the minima at the correct inverse depth (C = 5 ) does not move, while the minima for the false match changes its position as the baseline changes.When the two functions are added to produce the SSSD-in-inverse-depth, its expected values E[eC(lz)] are as shown in figure 2 (f).We can see that the ambiguity has been reduced because the SSSD-in-inversedepth has a smaller value at the correct match position than at the false match.

2.4
Elimination of Ambiguity (2) An extreme case of ambiguity occurs when the underlying function f ( z ) is a periodic function, like a scene of a picket fence.We can show that this ambiguity can also be eliminated.
Let f(z) be a periodic function with period T.Then,  ,(z,C) will be still a periodic function of C, but its period T12 is increased to where L C M ( ) denotes Least Common Multiple.That is, the period of the expected value of the new evaluation function can be made longer than that of the individual stereo pairs.Furthermore, it can be controlled by choosing the baselines B1 and B1 appropriately so that the expected value of the evaluation function has only one minimum within the search range.This means that using multiplebaseline stereo pairs simultaneously can eliminate ambiguity, although each individual baseline stereo may suffer from ambiguity.
We illustrate this by using real stereo images.Figure 3(a) shows an image of a sample scene.At the top of the scene there is a grid board whose intensity function is nearly periodic.We took ten images of this scene by shifting the camera vertically as in figure 4 .The actual distance between consecutive camera positions is 0.05 inches.Let this distance be b. Figure 3 shows the first and the last images of the sequence.We selected a point z within the repetitive grid board area in images.The SSD values eC(i)(z, C) over 5-by-5-pixel windows are plotted for various baseline stereo pairs in figure 5.The horizontal axis of all the plots is the inverse depth, normalized such that 8bF = 1. Figure 5 illustrates the trade-off between precision and ambiguity in terms of baselines.That is, for a shorter baseline, there are fewer minima (i.e. less ambiguity), but the SSD curve is flatter (i.e. less precise localization).On the other hand, for a longer baseline, there are more minima (i.e. more ambiguity), but the curve near the minimum is sharper; that is, the estimated depth is more precise if we can find the correct one.The inverse of the variance represents the precision of the estimate.Therefore, equation (32) means that by using the SSSD-in-inverse-depth with multiple baseline stereo pairs, the estimate becomes more precise.We can confirm this characteristic in figures 6 and 7 by observing that the curve around the correct inverse depth becomes sharper as more baselines are used.

E x p e r i m e n t a l Results
This section presents experimental results of the multiplebaseline stereo based on SSSD-in-inverse-depth with real 2D images.A complete description of the algorithm is in (OK901 The first result is for the "Town" data set that we showed in figure 3. Figures 8 is the computed depth map with a single long baseline, B = 9b.We can see gross errors in matching at the top of the scene because of the repeated pattern.Figure 9, on the other hand, shows the depth map obtained by the new algorithm using three different baselines, 3b, 6b, and 9b.The gross errors are removed in this case.
Figure 10 shows another data set used for our experiment.Figures 12 and 13 compare the isometric plots of the depth maps computed from a short baseline stereo and a long baseline stereo: the longer baseline is five times longer than the short one.For comparison, the actual oblique view roughly corresponding to the isometric plot is shown in figure 11. we observe that the depth map computed by using the long baseline is smoother on flat surfaces, i.e., more precise, but has gross errors due to false matching, though no repetitive patterns are apparent in the images.These results illustrate the trade-off between ambiguity and precision.In contrast, the result from the multiple-baseline stereo shown in figure 14 demonstrates both the advantage of unambiguous matching with a short baseline and that of precise matching with a long baseline.

Conclusions
In this paper, we have presented a new stereo matching method which uses multiple baseline stereo pairs.This method can overcome the trade-off between precision and accuracy (avoidance of false matches) in stereo.The method is rather straightforward: we represent the SSD values for individual stereo pairs as a function of the inverse depth, and add those functions.The resulting function, the S S S D-in-inver se-dept h , exhibits an unambiguous and sharper minimum at the correct matching position.As a result there is no need for search or sequential estimation procedures.Furthermore, the algorithm is easily amenable to parallel hardware implementation.
The key idea of the method is to relate SSD values to the inverse depth rather than the disparity.As an afterthought, this idea is natural.Whereas disparity is a function of the baseline, there is only one true (inverse) depth for each pixel position for all of the stereo pairs.Therefore there must be a single minimum for the SSD values when they are summed and plotted with respect to the inverse depth.
We have shown the advantage of the proposed method in removing ambiguity and improving precision by analytical and experimental results.

Figure 1 :
Figure 1: Camera positions for stereo , Pn and a resulting set of stereo pairs with baselines B1, B', . . ., Bn as shown in figure 1.Let fo(z) and f;(z) be the image pair at the camera positions PO and P;, respectively.Imagine a scene point 2 whose depth is z.Its disparity dr(q for the image pair taken from P o and P; is The image intensity functions f o ( z ) and f;(z) near the matching positions for 2 can be expressed as fo(z) = f(z)+no(z) j € W at position z for a candidate inverse depth C. eC(12...n)(z, C), the sum of SSD functions with respect to the inverse depth (SSSD-in-inverse-depth) for multiple stereo pairs.It is obtained by adding the SSD functions e((;)(", C) for individual stereo pairs: Figure 2 (a) shows a plot of f(z).Assuming that d,(l) = 5, cr: = 0.2, and the window size is 5, the expected values of the SSD function ed(l)(z, d(l)) are as shown in figure 2 (b).We see that there is an ambiguity: the minima occur at the correct match d ( ] ) = 5 and at the false match d(l) = 13.Which match will be selected will depend on the noise, search range, and search strategy.Now suppose we have a longer baseline Bz such that 2 = 1.5.From equations (6) and ( l o ) , we obtain E[ed(z)] as shown in figure 2 (c).Again we encounter an ambiguity, and the separation of the two minima is the same.Now let us evaluate the SSD values with respect to the inverse depth ( rather than the disparity d by using equations (12) through (15).The expected values of the SSD measurements E [ ~c ( ~) ]

Figure 4 :
each eC(;)(z,C) is expected t o be a periodic function of < "Town" data set image sequence Figure 3: nTown" data set: (a) ImageO; (b) Image9 with the period &.This means that there will be multiple minima of eC(i)(z,C) (i.e., ambiguity in matching) at intervals of & in C. When we use two baselines and add their SSD values, the resulting eC(12

Figure 8 :
Figure 8: Computed depth map with a long baseline, B = 9b: There are many gross mistakes, especially in the top of the image where, due to a repetitive pattern, the matching is completely wrong.