Next, we need to think about Multi View Geometry. This is happens when we look at the world from multiple directions, either because we are moving in space looking at the same object from different view, or in fact several some our friends are looking at the same scene from different angles. In that setup, we no longer have the luxury of thinking, using the first person measurements. We cannot think we are the universe centers, we have to share a common representation between all the different views. As such we need to define a third person measurements, are the world measurements. So, here we show points in the three dimensional space measured in the third person three dimensional coordinate system. That coordinate system will be invariant, no matter how you look at this world. Whether from your view or your friend's view, this X three dimensional vector will remain constant, what's changing then is how this three dimensional vector is, appeared in my first person perspective. It's the same point, but it once was written down in a third person perspective in a variant representation, independent how I'm seeing those points. And, what I need to is taking that three dimensional measurements, and converted to my first person perspective. Where I'm the camera center, and I have a particular orientation where X is an ideal camera pointing to the right, Y pointing down, Z pointing forward. Every camera will have their own first camera, first person coordinate system, and this transformation can be easily represented, is rotation illustrating a change of coordinate system, and translation that transformed the world center to the camera center. And this can be expressed, by the equation illustrated here, a rotation followed by a translation. Again, it is transformation of a third person, three dimensional coordinate system to a first person three dimensional coordinate system. Once, a points is transformed through a coordinate transformation, so the first part camera coordinate system, we can run this through the same camera projection equation we have, which is modeled by first a dent in the matrix and followed by a zero column and then followed by a K matrix which is a camera calibration matrix 3 x 3. Into a two dimensional world written down as a homogeneous coordinate. Combining those matrix together, we obtain the ideal camera projection matrix, which has the following form, where on this three dimensional space we have a point X. Written the homogeneous coordinate X, Y, Z, 1. And this vector is multiplied by a 3x4 matrix, consists of a rotation matrix, 3x3, followed by a translation vector T is 3x4 matrix, that enclosed translation from the third person perspective to the first person perspective only 3D. Then the 3D in the first person world, is then projected through the K matrix, as we saw, down to the image in two dimensional picture coordinates. So put it together, we can look at this form, in its complete form, look like this. We have a points, X Y Z, in the world corner system, or what I call the third person perspective to the measurements. And that's a perspective, in variance to all view you have. Then this is multiplied by a rotation matrix 2x3, it is made up of three columns, followed by the fourth column, the translation vector, a 3x4 matrix. What it does is taking a point measure in the third person perspective, and change it, so that, that point is measured now in my egocentric first person cornice system. It's just do three dimensional points, it just has gone through a cornice transformation to my first person view. District, the first person view, is then multiplied the calibration matrix K, forming a two dimensional image in the pixel space. And the pixel space is always first person, everybody is seeing the world slightly different from other person. This is my own unique view of the world, where X,Y position changes depending how I am oriented to the space. Now we have returned to where we started, the camera projection. As we remember we have three components, the camera body itself, I won't care about a camera body is orientation and position of the camera. This in fact is written down in the rotation matrix R and T, and we often call that external parameters to camera. We also have the K matrix, which is a 3x3 matrix, what it does is transform a three dimensional object in the first person perspective into a two dimensional representation in the pixel domain. And what, we are not talking yet is the nonlinear distortion due to the lens which we will call L. The focal lines of the lens, in fact, if it's folded into the calibration matrix K, an object in a 3D world, in the third person perspective is transformed by these three factors. First through the camera rotation translation, it take a three dimensional object in third person view to the first person view. Then take the camera calibration matrix K, convert from 3D to 2D representations, So that's a general camera projection matrix. And we'll study two special cases, the first special cases is one three dimensional object we're looking at, in fact, it's a planar object. Is the three dimensional coordinates X Y Z, but all the X Y Z sit in the plane. We call the three dimensional coordinate system the third person perspective, and we have freedom to choose how we measure the three dimensional points, because we know the object sits in a plane, we can think of a particular coordinate system. Such as, the X Y is in the plane of the object, and Z is perpendicular to the plane. So that, any point sit on the plane has a special property that Z equal to zero. So has any points on this particular plain object, we can write down each position, X Y Z is equal to 0, 1. Take that coordinates transformation, through the rotation translation. And through the camera calibration matrix K, we obtained a two dimensional image of, this plane object in the camera. In the homogeneous coordinate, and in the homogeneous coordinate represent a points in image through an array, so this simplification allow me to take the first two columns of rotation matrix. And the third column as the translation factor, combine with the calibration matrix into a single 3x3 matrix, which include the total transformation of a plane objects in 3D into the image plane. And this is written down as with this equation, we have the XY coordinates. X Y is on the physical three dimensional plane, measured on the checkerboard, we know exactly the size of the checkerboard and its exact X, Y coordinates on the checkerboard image itself. And those positions are transformed by the 3 by 3 matrix in to the image, which we took of the plane. And the image we took of that plane could look quite skewed is on orientation of the camera. But no matter how I orient my camera, I always have a particular form of 3 by 3 matrix encoding the transformation of coordinates under checkerboard on the plane to checkerboard in the pixel domain. The small caveat here is we have a null factor Z. What Z is telling me, is that if I were looking from the pixel domain, I have a ray going from the optical center into the pixel extends out. If I follow the ray, by a factor of Z which is how far it is that point compared to the optical center. When I multiply by the correct Z factor, you will hit the image plane at that point. So the Z in this equation is dependent on every pixel, on every corner of this checkerboard. But if you were thinking about on the image domain, the homogeneous coordinates, we can remove this unknown factor Z and simply write everything as the homogeneous coordinates, X Y in the image, 1 equal 3 by 3 matrix. The hormography, multiply X Y, which is in the world, one, in the world coordinate system. And, this is often called the hormography transformation. The second special case, is the case when the camera is just rotating about it's optical center. In this case, the T vector is 0, I'm no longer moving the optical center anywhere. In fact this is really hard to do, unless you know exactly where the optical center is, and in fact earlier this lecture we talk about how to identify this optical center for this reason. Once able to identify the optical center, I can try to pivot my camera just around that point, such that the motion of the camera consists of only rotations. There's no translation, given t = 0, we can further simplify this camera projection matrix, so it has become the calibration, which is K, times simply the rotation matrix itself, and that last column of the transformation, T, disappeared. Together with it, it took out the last elements of the homogeneous coordinates of X Y Z one. So now only half this transformation, where you a camera calibration which is K, three by three, times a rotation matrix, three by three, times X Y Z in the world space. And that object, X Y Z is multiplied through by the three by three matrix Into the homogeneous coordinates in the image space itself. The interesting fact of this transformation, is the fact that we can use this transformation to create interesting images. Every image is rotated from each other. As it goes through, again, a simple homography transformation. A three by three transformation, you have one picture, another picture, and a two picture going through a pure rotation can be described as this three by three transformation. A homogramous transformation, then given then set of images I have taken from different directions, I can align them in one space. Where each image has gone through a three by three transformation, such that all the objects in the picture are aligned. And this is called a panorama, another interesting thing about the factor translation is zero, is there's no longer a creation of a motion parallax. Meaning that, if I just purely rotate an object, there's no relative change of displacement between two objects relative to each other in terms of fusion angles. Taking two points in space for a visual angle no matter how high I move it, if the camera center's not translating, that visual angle will stay stationary. This also tell us that we no longer have the three dimensional sensation in the world, in order to create sensation in the world, then we have to move the camera to center T and we will continue that discussion next time.