So let's go back, we have a first person camera world, where we have a image plain in front of us, and the focal is F. And the point, XYZ, are measured in the camera relative to myself. I simply take any x measurements, divide by z, which is how far away from me into the camera and expanding by factor f onto the image plane. Similarly we take the y, shrinking it by a factor of z, and magnify it by factor f onto the image plane. This simple transformation from 3D to 2D will generate a camera projection equations that will written down in the matrix format. The way we're going to do this is we're going to take the coordinate system that we see in the image plane, which is defined here as x prime and y prime, and expel it back into the homogeneous coordinate. And the reason we would like to go to the homogenous coordinate space, is because we can represent point infinity, line to infinity, quite conveniently. And we can write everything down in simple linear equations. So the first step we do, is we take this point x point, of y pi, and we re-express it in image space as x prime, y prime 1, the homogeneous coordinate of that point. That point we see is y magnified by the factor of z, which is distance to the camera equals exactly the factor divided by focal length times the x, y, z in the 3D space. Focal length times y and z. The three dimensional rays are coinciding, in fact they're equal to each other. This allowed us to take this equation and rewrite in the following homogenous coordinate transformed space where on the 3D space, we have a 3D dimensional point, x, y, z, written down a homogenous coordinate with one attached to it, a four dimensional vector. In that space, we multiply it by a three by four transformation matrix, which has the following form. As a three by three matrix which has a diagonal form, on the diagonal we have the focal lens, F. Which on the first two elements, one on the last elements. And the fourth column in this transformation matrix, is all zero. In that three dimensional transformation space. This modification give rise to a three dimensional vector in the two dimensional image plane. Correspond to the homogenous coordinates of a two dimensional point, x prime, y prime 1. An image space, x prime, y prime 1 multiplied by z forms a ray Into the three dimensional space and that array in facts intersects the three dimensional points in the space. This gives rise to the equation that we start using quite often in the next few lectures which is the image represents a point little x. Equals the camera projection matrix P times X which is a three dimensional vector written down in a homogeneous coordinate in the four dimensional space. See in the second step we are going to convert optical measurements. So far we have talked about first person camera world where everything's measured in terms of meters or millimeters. And we are measuring relative to camera center, or in to centered on myself. Now, what we need to do is take those optical measurements, measuring millimeters, back to pixel coordinates, which is image in the camera itself. So, this conversion is going to convert from millimeters to pixels. In this transformation there's two factors to be considered. The first is a simple factor that that image is represented in the computer as a matrix. As such, the zero zero origin of the matrix is typically at the top left corner of the image. So the matrix starts with row number one or row number zero column number one. So zero zero is in the top left. Where as in the optical image the center of my image plane is zero zero origin where x is pointing to the right, y pointing down. In fact, the center of the optical world is the optical axis going from our eye to the image plane. Perpendicular to the image plane it hits the image plane at a particular point. At that point it's called the principal point. So those two factors will allow me to change the optical image, which is the cameras in front of me, to a pixel coordinates, which is defined by matrixes. So first we need to do, is offset from the 00, which is in the canvas image. The optical route moving the principal point into the top left corner, and that simply is a transformation of shift. Then what we need to do is, in that measurement space, what we need to convert is millimeters, are in fact in this case micrometers, into pixels. And this is simply done by the factor defined by the size of the picture itself. We measure the size of the pixels, each pixel has certain width and height. Then we take measurements in the optical world, divide it by the pixel size. That will allow me to index into the matrix form where that pixel is in terms of pixel coordinates. So, this is defined by a simple linear transformation, shift, from the principle point to the upper left corner followed by a scale factor. And this can be written down, again, as a matrix transformation. In case, this case is a two dimensional image plane, in the optical plane to the pixel plane. A particular transformation that we will use is of the form of a three by three matrix, or upper diagonal, has three pieces to it. The first piece is the scaling factor. Had to do with how big the pixel is. How you can verify millimeters or micrometer down to pixels. The second piece of this puzzle had to do with the principle point. Where is the optical axis hits the image. Ideally, it should be the middle of the image. Unfortunately, due to the camera mount or the sensor mount, in practice, it's not exactly in the middle. It could be that it drifted down or drifted up. It depends on how the lens is oriented. If the lens is oriented downward, the optical axis will hit an image below the center of the image. We will study how to calculate this point precisely through the calibration slides. And lastly, due to the lens mount, or due to the sensor mount, the image plane in the camera might not be exactly perpendicular, parallel to the Ideal image plane. It could be slightly slanted, and this is illustrated by a factor of s in off diagonal form. So in form, in this equation what we can do quite often is try to combine this internal camera conversation from the optical ray to the pixel ray. Together with the focal length, which is also a 3 by 3 matrix, we will further combine them into a combined calibration matrix. And the combine calibration matrix contains both the factor of magnification due to the focal length's change as well as the magnification factor due to the pixel size change, and this, together, forms a combined calibration matrix. Again, the matrix is a factor of 3 by 3, has the size of a 3 by 3, or upper triangular. As a diagonal elements indicating the scale factor of diagonal entry s to do with the slant factor due to the image plane which is not exactly frontal parallel to the ideal image plane. And the focal lengths, the principle point which had to do with optical ray where it hits the center of the image. And we call this 3 by 3 matrix, the matrix K. Stand for a calibration matrix. Putting everything together, this is the camera projection equation for a first person camera configurations. We have points in the 3D space measured in the first person camera view, where I'm the origin, x to the right, y points down, and z points forward, and this point x y z is expressed into four dimensional coordinates in this homogeneous coordinate form. It was then multiplied by a very simple matrix, which is 3 by 3. I entered the matrix, followed by a column of three vectors. And what it does is then take a three dimensional coordinate and bring it down to a three dimensional. But in fact lives on a two dimensional space, an ideal image plane. This ideal image plane then transforms through the camera calibration matrix, a three by three matrix, into the pixel domain, which lifts on the left hand side. The left hand side will have the x, which is you image the Y, which is the V image for 1, the homogeneous coordinate of the pixel. It's a two dimensional output but we're representing it as a three dimensional array. It's homogeneous coordinates. When that array multiply by the n of distance Z, these two equations are exactly equals. Again this first person camera projection matrix is a transformation of a three dimensional world in the first person measurements, me, shrinking from a three dimensional space down to a two dimensional space. And the transformation you can see is made of two components, one is the camera calibration, the scaling factors, the principle points where the optical ray hits the ideal image plane where the slam factor is followed by in this case a simple matrix which encodes Identity is zero. As we move to a different reference center, this identity is zero will start changing. Because I've used the first person geometrical representation of a 3-D world, I can take the equation and further simplify it into simply taking the XYZ coordinates of a points in my self reference center. Multiply directly by the camera calibration matrix, okay. In fact, that transformed to a three dimensional ray. And that's ray represented by the homogenous coordinates over the image plane. Again, once inside the camera calibration, matrix K really are just three factors. The first factor had to do with the scaling factor due to the focal lens change as well as the size of the pixel conversions. The second one had to do with optical ray hitting the image and sometimes they hit the middle of the image if the camera's lens is mounted correctly. Or in a case the camera lens tilted due to gravity, it might hit some points further down or further up from image center. And third, because the optical access might be tilted, it can due to the lens, it can be pulling down by the gravity factors. The image itself is formed could be a transformation from a new image due to a slant factor, and this is represented by the factor s. So this will be our calibration matrix and we will have separate lectures on how estimate those three factors, given a set of images.