Abstract

To make a Euclidean reconstruction of the world seen through a stereo rig, we can either use a cal ibration grid, and the results will rely on the preci sion of the grid and the extracted points of interest, or use self-calibration. Past work on self-calibration is focussed on the use of only one camera, and gives sometimes very unstable results. In this paper, we use a stereo rig which is supposed to be weakly calibrated using a method such as the one described in 1]. Then, by matching two sets of points of the same scene reconstructed from di erent points of view, we try to nd both the homography that maps the projective reconstruction 2] to the Euclidean space and the displacement from the rst set of points to the second set of points. We present results of the Euclidean reconstruction of a whole object from uncalibrated cameras using the method proposed here.

1 Introduction

This article is concerned with the following prob lem. Given a weakly calibrated stereo rig, i.e. a pair of camera with known epipolar geometry, we know that we can obtain 3-D reconstructions of the environment up to an unknown projective transformation 2, 5]. We call such a reconstruction a projective reconstruc tion. In particular, no ane or euclidean information can a priori be extracted from it unless some further information is available 4]. The problem is then to determine what is the information that is missing and how can it be recovered. We provide a very simple an swer to both questions: with one rigid displacement of the stereo rig, the three-dimensional structure of the scene can be in general uniquely recovered up to a similitude transformation using some elementary ma trix algebra, assuming that reliable correspondences between the two projective reconstructions obtained from the two viewpoints can be established. We call such a reconstruction a euclidean reconstruction. A similar result was obtained 8] but the resulting scheme was a closed form solution computed from two views of

the scene, whereas this method can be used with many more views, giving more stability on the solution. This result does not contradict previous results, for example 7, 6] which showed that the intrinsic param eters of a camera could be in general recovered from two displacements of the camera because we are using simultaneously two cameras. The method developed here avoids any reference to the intrinsic parameters of the cameras and does not require solving the nonlinear Kruppa equations which are dened in the previous references.

2 Goal of the method

Our acquisition system consists of a pair of cam eras. This system can be calibrated using a weak cali bration method 1], so that we can make a projective reconstruction 2] of the scene in front of the stereo scopic system, by matching features (points, curves, or surfaces) between the two images. Projective reconstruction roughly consists of chos ing ve point matches between the two views and chos ing these ve points as a projective basis to reconstruct the scene. The ve point matches can be either real points (i.e. points that are physically present in the scene) or virtual points. The virtual point matches are calculated by choosing a point in the rst cam era, and then choosing any point on its epipolar line in the second camera as its correspondant, thus these points satisfy the epipolar constraint but are not the images of a physical point. Let us call P the resulting projective basis which is thus attached to the stereo rig. Let us now consider a real correspondence (m1 m1 ) between the two images. We can reconstruct the 3-D point M1 in the projective basis P . Let us now sup pose that after moving the rig to another place, the correspondence has become (m2 m2 ), yielding a 3-D reconstructed point M2 in the projective basis P . We know from the results of 2, 5] that the two recon structions are related by a collineation of P 3 which is represented by a full rank 4 4 matrix H12 dened up to a scale factor. We denote by the symbol = the 0

0

equality up to a scale factor. Thus we have

E

M2 = H12M1

where M1 and M2 are homogeneous coordinate vec tors of M1 and M2 in P . Let us now imagine for a moment that an orthonor mal frame of reference E is attached to the stereo rig. The change of coordinates from P to E is described by a full rank 4 4 matrix H12 , also dened up to a scale factor. In the coordinate frame E the two 3-D reconstructions obtained from the two viewpoints are related by a rigid displacement, not a general collineation. This rigid displacement is represented by the following 4 4 matrix D12 :

D12 = R012 t112

where R12 is a rotation matrix. It is well known and fairly obvious that the displacement matrixes form a subgroup of SL(4) which we denote by E (3). We can now relate the three matrixes H12 H, and D12 (see gure 1):

H12 = H 1 D12 H ;

(1)

Since the choice of E is clearly arbitrary, the matrix H is dened up to an arbitrary displacement. More pre cisely, we make no dierence between matrix H and matrix DH for an arbitrary element D of E (3). In mathematical terms, this means that we are interested only in the quotient SL(4)=E (3) of the group SL(4) by its subgroup E (3). Therefore, instead of talking about the matrix H we talk about its equivalence class H. The basic idea of our method is to select in the equiva ^ , which is the same lence class a canonical element D^ H as selecting a special euclidean frame E^ among all pos sible ones and show that equation (1) can be solved in 1 ^ and D general uniquely for H = D D12 D. 0

;

3 Colineations modulo a displacement

3.1 First method

Finding a unique representative of the equivalence classes of the group SL(4) modulo a displacement in E (3) is equivalent to nding a unique decomposition of a collineation (which depends upon 15 parameters) into the product of a displacement (which depends upon 6 parameters) and a member of a subgroup of dimension 15 ; 6 = 9. In fact, we are looking for some thing similar to the well-known QR or QL decompo sitions of a matrix into an orthogonal matrix and an upper or lower triangular matrix, where orthogonal would be replaced by displacement.

D12

E

H

H H12 P

P

Figure 1: Given the collineation H12 we want to nd the collineation H that maps the projective reconstruc tion to the euclidean reconstruction and the displace ment D12 . Let us thus consider an element H of SL(4) and assume that the element h44 is non zero. We dene the 3 1 vector t by t = h14 =h44 h24=h44 h34=h44]T (2) and write H as

I t A 0 3 H = h44 0T 1 lT 1 (3) Note that since det H = h44 det A 6= 0, this implies that det A 6= 0. Then there is a unique QL decompo sition of A, so that H = h44 0QT 1t cclTL 01 (4) where Q is orthogonal and L is lower triangular with

strictly positive diagonal elements. Thus the group SL(4) modulo the displacements E (3) is isomorphic to the group of the lower triangular matrices with strictly positive diagonal elements. Q is a rotation if det H > 0, or a plane symetry if det H < 0 (remember that the sign of det H cannot be changed because H is of dimension 4. If we want to decompose H into a rotation and a translation, we have to remove the constraint on the sign of one the elements of the diagonal of L, e.g. there is no constraint on the sign of the rst element of L. In practice, the decomposition will be done using a stan dard QL decomposition, and then if Q is a plane sym metry rather than a rotation we just have to change the sign of the rst element of L and of the rst col umn of Q, so that the multiplication of both matrices gives the same result and Q becomes a rotation.

3.2 Second method

Another way to nd a unique representative of the equivalence classes of the group of collineations mod ulo a displacement is to build these representatives by applying constraints on the group of collineations corresponding to the degrees of freedom of a displace ment. A simple representative is the one such that the image of the origin is the origin (i.e. the translational term of the collineation is zero), the z axis is globally invariant (i.e. the axis of the rotational term is the z axis), and the image of the y axis is in the yz plane the sign of the y coordinate being invariant (i.e. the angle of the rotation is zero). These constraints correspond to constraints on the form of matrix H. The image of the origin by H is the origin itself i:

H 0 0 0 1] = 0 0 0 a]

(5)

The z axis is globally invariant i:

H 0 0 1 0] = 0 0 b c]

(6)

And the last constraint (the angle of the rotation is zero) corresponds to:

H 0 1 0 0] = 0 d e f ] 3

g 0 0 0 6h d 0 07 H = 64 j e b 075 (8) k f c 1 with d > 0 and f > 0. Thus equation 1 becomes:

H12 = L 1 D12 L ;

;

;

;

;

;

2

D12 = 64

(7)

and a, d, and f have the same sign. Consequently, H being dened up to a scale factor and non-singular, it can be written as: 2

from di erent points of view. Let H12 be the projective transformation (or colineation) from B to A. Then The eigenvalues of H12 are (withporder of multiplic ity 2), ei , and e i , with = 4 det H12 , and the last coordinate of H12 , h44 , is not zero. Equation 1 yields that H12 andpD12 are conjugate (up to a scale factor), then H12 = 4 det H12 and D12 have the same eigenvalues, which are: 1 with order of multiplicity two, ei , and e i . Before continuing, we have to prove the following lemma: Lemma 2 for each 33 real matrix A whose eigenval ues are (1 ei e i ), there exists a 3 3 lower triangu lar matrix L (lik = 0 for k > i) with lii > 0 i = 1 2 3 de ned up to a scale factor, and a rotation R, satisfy ing A = L 1 RL. Since its eigenvalues are either real or conjugate of each other, a real matrix whose eigenvalues are of module one can be decomposed in the form A = PD12P 1 , where D12 is a quasi-diagonal matrix of the form:

(9)

where L is a lower triangular matrix with the second and third coordinates of the diagonal positive and the last set to 1.

4 Back to the Euclidean world

In this section we show how to recover partly the Euclidean geometry from two projective reconstruc tion of the same scene. The only thing we have to do is to solve equation 1 for a lower triangular H . Let us rst establish some properties of the colineation be tween the two reconstructions.

Proposition 1 Let A and B be two projective recon structions in P3 , the projective space of dimension 3,

of the same scene using the same projection matrices

B1 0

...

3

0

Bk

7 5

(10)

; sin i cos i We can then compute the QL decomposition of with Bi = 1] or

P 1, P ;

;

cos i sin i

QL which gives: A = L 1 QT D12 QL = L 1 RL

1 =

;

;

where L is a lower triangular matrix with positive di agonal elements, and R is an orthogonal matrix. Since det A = det R = 1, then R is a rotation.2 We now have all the tools needed to prove the fol lowing theorem. Theorem 3 Let A and B be two projective reconstruc tions of the same scene using the same projection ma trices from di erent points of view. Let H12 be the pro jective transformation (or colineation) from B to A. H12 can be decomposed in the form H12 = L 1 D12L, where L is lower triangular and D is a displacement. The set of solutions is a two-dimensional manifold, one dimension being the scale factor on the Euclidean space. If we take three reconstructions taken from generic points of view, the full Euclidean geometry can be re covered, up to a scale factor. ;

Let us suppose that det H12 = 1 to eliminate the scale factor on H12 . Let 1l be an eigenvector of HT12 corresponding to the eigenvalue 1. This implies:

IT 0 H12 = A b IT 0 l 1 0 1 l 1

(11)

so that H can be decomposed in the form:

IT 0 A b IT 0 ;l 1 0 1 l 1

H12 =

(12)

(13) H12 = lT ;;1 A; +lT bbl I ; A 1 ;blT b Using the lemma 2, A can be decomposed into: A = L 1RL (14) T

;

the complete Euclidean structure from one displace ment (i.e. two projective reconstructions). One way to deal with it would be to x one of the intrinsinc parameters of the cameras8], e.g. by saying that the x and y axis of the cameras are orthogonal, but since in our scheme the intrinsinc parameters do not appear clearly we could not use this. Another one is to simply use more than one displacement, as we demonstrate it in the next section.

5 Euclidean reconstruction of a whole object using stereo by correlation

To test this method, we took several stereoscopic pairs of images of an object using a stereo rig (Fig ure 2). In this experiment, we used a mathematical object (called cyclid) which equation is known, but the fact that we know its geometry was not used in the recovery of its Euclidean geometry. We performed

and we can write b as:

b = L 1t

(15)

;

Thus,

1 1t + L 1 tlT L H12 = lT ;;1 L; lTRL L 1t I ; L 1 RL 1 ; lT L 1 t ;

;

;

;

;

;

(16)

which can be factorized as:

H12 =

LT 1 1 0 R t LT 0 ;l L 1 0 1 l 1 ;

;

(17)

We showed that this decomposition exists, but it is certainly not unique. If we count the parameters on each side, H12 has 16 parameters minus 3 because 2 eigenvalues must be 1 and the two others have one degree of freedom (the angle of the rotation, ), which makes 13 parameters on the left side of equation 9, and on the right side we have 6 parameters for the dis placement and 9 for the lower triangular matrix which makes 15 parameters. Then the solution to this equa tion is not unique and the set of soloutions must be a manifold of dimension 2. One of the two remaining parameters is the scale factor on the Euclidean space, which can not be recovered because we have no length reference. We can eliminate it by setting one of the parameters of the diagonal of L to 1 (they can never be zero because L is non singular). It was shown clearly in 3, 8] that the other parame ter represents the incertitude on the choice of the abso lute conic from H, because one displacement does not dene that conic uniquely, so that we cannot recover

Figure 2: One of the ten stereoscopic pairs used for the example weak calibration1] on these stereo pairs and computed disparity maps using stereo by correlation. We can show that a disparity map computed from a pair of rectied images can be considered as a 3-D projective reconstruction: Proposition 4 Let d(x y) be a disparity map, where x and y are recti ed image coordinates. The projective points formed using by the recti ed image coordinates as the two rst coordinates, the disparity as the third coordinate, and 1 as the last coordinate, form a projec tive reconstruction, i.e. the 3-D Euclidean coordinates can be recovered by applying a 3-D collineation H to the points (x y d(x y) 1). Let P and P be the projection matrices correspond ing respectively to the rectied reference image and the other rectied image. Since the projection of a 3-D projective point M has the same y coordinate in both images, then only the rst line of P and P dier: 0

0

2 3

p1 P = 4p25 and P p3

2

0

3

p1 = 4 p2 5 p3 0

(18)

consequently

PM = (x y 1) and P M = (x + d(x y) y 1) 0

so that nally

T M = H(x y d(x y ) 1)

with

2

p1 p2

(20)

3;1

H = 664p 1 ; p1775 0

(19)

(21)

p3

Thus the disparity map (x y d(x y) 1) is a projective reconstruction.2 As we have seen before, we have 8 unknowns for the matrix L and 6 unknowns for each displacement, which makes 6 + 8(n ; 1) unknowns, if n is the num ber of stereo pairs. We compute these parameters us ing a least-squares minimization technique: We match points between the rectied reference images of over laping stereo pairs1, and the error to be minimized is the squared distance between the points of reconstruc tion i transformed by the matrix L 1 Dij L and the matched points of reconstruction j . This error mea surement is done in (image+disparity) space, which is not the real 3-D Euclidean space, but since image space is almost Euclidean and disparity is bounded, it should work ne. This minimization is done in two steps: First, only the matches between rectonstruc tions i and i + 1 are considered, and the minimiza tion is done over L and the Dii+1 , 1 i < n, which are represented as a rotation vector and a translation vector. In practice the error function associated with this minimization is well-conditioned, so that we get rather good estimates, whatever the initial point. Sec ond, all the matches between dierent reconstructions are considered (especially the one between n and 1, which forces the surface to fold over itself), and the minimization is done once again. In fact we recovered the complete Euclidean geom etry of our object. Figure 3 shows the reconstruction from the rst stereo pair, as seen when transformed by matrix L, and Figures 4 and 5 show the complete reconstruction of the object from 10 stereo pairs, with lighting or with texture mapping. ;

6 Conclusion

In this paper we presented a method to recover partly or completely the Euclidean geometry using an 1 This process was done manually in our experiment but could be automated.

Figure 3: The Euclidean reconstruction from the rst stereo pair

Figure 4: The complete reconstruction of the object, rendered with lighting

References

Figure 5: The complete reconstruction of the object, rendered with lighting and texture mapping from the original images

uncalibrated stereo rig. All we need to do this is the fundamental matrix of the stereo rig, which can be cal culated by a robust method like 1], and point matches between the dierent stereo pairs, which could be com puted automatically. Using multiple stereo pairs, we increase the stability of the algorithm by adding more equations than unknowns. We presented results on a real object, which was fully reconstructed in Euclidean space using a few stereo pairs. The possible applications of this method include the possibility to acquire easily 3-D objects using any set of uncalibrated stereo cameras, for example to mod elize an object to be used in virtual reality, or au tonomous robot navigation. In the near future we plan to enhance the system in order to make it completely automatic: we must have a way to match points automatically (feature tracking would be a good starting point) a to perform fusion and simplication of the 3-D reconstruction once the registration is done. Furthermore, in order to test the accuracy of the Euclidean reconstruction, we can ei ther compare the intrinsic and extrinsic parameters of the cameras computing a classical camera calibration method with those recovered using this method, or compare the 3-D reconstruction with a mathematical model of the object (which is known in the example presented here).

1] R. Deriche, Z. Zhang, Q.-T. Luong, and O. Faugeras. Robust recovery of the epipolar ge ometry for an uncalibrated stereo rig. In J-O. Eklundh, editor, Proceedings of the 3rd European Conference on Computer Vision, volume 800-801 of Lecture Notes in Computer Science, pages 567576, Vol. 1, Stockholm, Sweden, May 1994. Springer Verlag. 2] Olivier Faugeras. What can be seen in three dimensions with an uncalibrated stereo rig. In G. Sandini, editor, Proceedings of the 2nd Eu ropean Conference on Computer Vision, volume 588 of Lecture Notes in Computer Science, pages 563578, Santa Margherita Ligure, Italy, May 1992. Springer-Verlag. 3] Olivier Faugeras. Non-metric representations in 3-D articial vision. Nature, 1993. Submitted. 4] Olivier Faugeras. Cartan's moving frame method and its application to the geometry and evolu tion of curves in the euclidean, ane and projec tive planes. In Joseph L. Mundy, Andrew Zis serman, and David Forsyth, editors, Applications of Invariance in Computer Vision, volume 825 of Lecture Notes in Computer Vision, pages 1146. Springer-Verlag, 1994. 5] Richard Hartley, Rajiv Gupta, and Tom Chang. Stereo from uncalibrated cameras. In Proceedings of the International Conference on Computer Vi sion and Pattern Recognition, pages 761764, Ur bana Champaign, IL, June 1992. IEEE. 6] Tuan Luong and Olivier Faugeras. Active stereo with head movements. In 2nd Singapore Inter national Conference on Image Processing, pages 507510, Singapore, September 1992. 7] S. J. Maybank and O. D. Faugeras. A theory of self-calibration of a moving camera. The Interna tional Journal of Computer Vision, 8(2):123152, August 1992. 8] Andrew Zisserman, Paul A. Beardsley, and Ian D. Reid. Metric calibration of a stereo rig. In Proc. Workshop on Visual Scene Representation, Boston, MA, June 1995.