In this lecture, we're going to talk about rotations and translations. The focus of the lecture will be in substance formations between coordinate systems. When we talked about the camera model we talked about the camera coordinate system. Which has it centered of the projection of a lens. And we talked also about the world coordinate system which can be a reference frame fixed in the real world. In this picture here, we see the Bebop, one of the commercial quadrotors. And we see a camera coordinate frame which is centered of the lens, in the front of the Bebop. With the optical axis going outwards and the X axis to the right and the Y axis downwards. We're going to find out how to write coordinate transformations built out of rotations translations between these two coordinate systems. We're going to use a convention that we always have red for the X axis, green for the Y axis and blue for the Z axis. We remember RGB as a standard color convention, so RGB will stand for XYZ. In computer vision, we project the points into image planes. We take a picture of the point with a camera. This point however, might be known in a well coordinate system. We might know the GPS coordinate of these points. So on one hand we will have the coordinates of the point in GPS coordinates, we will call these coordinates, World coordinates. And on the other hand, we're going to have the coordinates of the point in the camera coordinate system, exactly the ray that is going out from the projection center and goes to the point. We are going to use the w and the c as prescripts for the vectors for the point B. So we're going to say sub cP for the coordinates of point with respect to the camera, and sub wP for coordinates with respect to the world. Now, we're going to write the rotation from the camera to work coordinate system as a prescript with c, a rotation subscript w, and prescript c translation subscript w. How can we find out, if we really see a snapshot of the scene. How the camera is aligned to well coordinated system? How can we find out what is its rotation and translation? The trick to find this out is always to look at this equation a point which expects the camera equals the rotation times the point we expect to work plus the translation. Let's look first at the translation. In this equation, if we set the world coordinates equal to 0, then we have the origin of the world. In this case, the vector PC starts from camera coordinating systems and goes to the origin of the world coordinate system. Because wP is equal to 0, we have that this vector is equal to the translation. So when we write camera point equals rotation times world point plus translation, this translation is always from camera to world. Let us find out now how we can write the rotation. The rotation is always an orthogonal matrix. Orthogonal matrix means that it has 3 orthogonal column vectors, r1, r2, r3. You can use your right hand figures, to denote r1, r2, and r3. And in our case, we will have red, blue and green in the picture. Now, if we follow the same trick we did with translation. And we set momentarily translation equal to 0 so we take it out of the picture. And we replace the work point with 1, 0, 0, we see that if we multiply r1, r2, r3 times 1, 0, 0 we'll get r1. The same if we multiply with 0, 1, 0 we get r2 and 0, 0, 1 we get r3. But what is actually 1,0,0 in world? This is the red vector, the x axis of the world coordinate system. So the meaning of the rotation matrix is the following. The first column is the x axis of the world with respect to the camera. The second column is the y axis of the world with respect to the camera. And the third color is the z axis of the world respect to the camera, the way we see it in the picture. The red is the x axis of the world is r1, the green is r2, and the blue is r3. The best way to understand the simple interpretation of the rotation and translation is with an example. Let's look at this example where we see a camera coordinate system with the x axis, the red vector, going inside the slide. And the well coordinate system with the red vector of the well coordinate system going outside the slide. So how do we write the rotation matrix? What is the X axis of the worth with respect to the camera? This is a vector which is coming out this way, while the camera is going inside this light. This is a vector which is coming out this way while the camera X axis is going this way. So they are parallel, but in opposite direction. This means that r1 is -1, 0, 0. What is the vector? That is the second column. The second column is actually parallel to the Z axis but in the opposite direction so it is 0, 0, -1. What is the third column, the blue of the world with respect to that color? It is parallel to the green with the green of the color, parallel with the Y axis. But the gain in the opposite direction, so it is 0, -1, 0. So with this very easy way we have found exactly the formation, the rotation actually from world to the camera. How can we find the translation? That's the easy part, this is just the vector from the origin of the camera to the origin of the world. If this vector is inside the YZ plane of the camera, it doesn't have any X coordinate. So it starts with 0, and then it goes 5 down and 10 towards the wall coordinate system. We want to make sure that the rotation matrix is always special orthogonal. Not only that the vectors are orthogonal to each other, but also that the determinant is equal to 1. And indeed in this case, it will compute at the terminal and is -1 times the sub determiner which is 0 -1 -1 0 which makes it 1. This is the final verification we always have to do, so that we can that we're working with right handed coordinate systems. Now, let's assume that we have one more coordinate frame, and see how we can relate three coordinate frames to each other? This coordinate frame we're going to add is something very common in quadrotors. It is a body frame with axis, X axis going forward, Y axis to the right, and Z axis going downwards. Again, make always the check, it is X, Y and Z and this right handed coordinate system. We still have camera coordinates system and the world coordinated system. The names we use for the body axis are actually, for the X axis it is the roll angle, for the Y axis it is the pitch, and for the Z axis it is the yaw angle. The best way to concatenate matrices and to relate multiple transformation matrices to each other, is to write them as 4x4 matrices by concatenating the rotation translation and just adding a row which is 0 0 0 1. Then by simple 4x4 matrix multiplication we can find the transformation from the world to the body frame. By writing the summation from the word to the camera times the transformation from the camera to the body. This is operation which is very common in computer graphics. Now, we have seen the transformation from world to camera or prescript w to transformation C. This is actually the inverse transformation from the camera to the world. How does this inverse transformation look like, we know that the inverse of a rotation matrix is the rotation matrix transposed. What about the inverse translation? It is easy to find by taking the inverse of the 4x4 matrix. Then we will see that on the upper right hand we have minus rotation transposed translation. This is really the inverse translation, which is the translation from the origin of the world to the origin of the camera. The orange vector we see there. This answers also the question, if we have the source formation, where exactly were coordinates. Let's say, in GPS coordinates is the camera? It's a position minus r translation. Now, is there any alternative interpretation of these transformations? Is there another way except by the interpretation of the columns, and the vector between the origins that we can find this transformation. And in this there is one, which is by rotating and translating the actual coordinates system. At the end of this slides we're going to show an actual animation of how this works. But let's now consider again these two coordinate frames the camera and the world. Again, with the convention, red is always x, blue is y and green is the z axis, and see how can we find the transformation between the two. This involves three steps. First, we can move the camera coordinate system on top of the world coordinate system. This is a 4x4 matrix where there is no rotation that's why we write the identity and in the last column we had the translation vector. To avoid mixing the coordinate system we just eliminate this translation vector and we show the two coordinate systems. A second is a rotation around x axis which will break the two set axis a light. So the x axis is this one it is just a rotation by 90 degrees which is not in the positive direction, it is in the negative direction. So this can be written as the matrix with a rotation only in this upper 3x3 matrix. And the last column here is 0 0 0 1. Now why does it have 1 0 0 0 0 1 while the 0 -1 is 0. It is because the side is -90 degrees. In the diagonal elements 2, 2 and 3, 3 we have the cosine of 90 which is 0. And then we have minus sign -90 which makes 1 and sine -90 which makes -1. At this point we need one more rotation, you see that the coordinate systems are not yet aligned. We have the set axis aligned but we need still to rotate or align the red with the green. If you watch carefully this a rotation of 180 degrees around the set axis and this can be written this way. Again, we subdue rotation matrix inside the 4x4 transformation, and having the cosine of 180 degrees be minus 1, and the rest of the elements 0. We have applied three steps, a translation and two rotations. When these are always with respect to the coordinate frame of the last pose, we always post multiply. This is the golden rule of moving coordinate frames. So when we first translate then rotate around the X axis, then rotate around Z axis and this is always the last X axis or the last Z axis. Then we always postmultiply, and we see that if we postmultiply these matrices are going to get exactly the same matrix we had, by having the interpretation of the orthogonal column vectors. So we have seen two ways to find the formation between two coordinate systems. One was by the interpretation of the columns and the translation vector, and the other is by actual motion. And I am going to show you now with actual motions how this happens so that you understand better these concatenations of the three motions. So I'm going to show you in a real situation where I am going to use pencils for the coordinate axis, how these coordinate system transformation look like. We have a camera here and you see the lens of the camera. And I'm going to put the camera coordinate system on top of the lens that is almost aligned with actual coordinate system of the camera. So we have the X axis and I'm going to show you like this the X axis and the Y axis always with a convention red, green for the X and the Y, and blue for the Z. And if you have seen in Math Lab you always get the X axis of the image is this way and the Y axis this way. So it's perfectly aligned with what you're doing program. Now we have also a checkerboard which is usually applied a calibration scenarios, when we want to calibrate. And we know every point of this checkerboard, we know with respect to the origin it is something that we printed. And let's have the following situation where the camera coordinate system is oriented in a way that the blue Z axis is parallel to the calibration pattern as well as the X axis. And we have the Y axis looking downwards. So you see how about it looks like. If we write the world coordinate system with respect to the camera coordinate system. You will see that the X axis of the world, which is actually this red axis here, it is in the opposite direction of the X axis of the camera. So it is minus 1, 0, 0. The Y axis of the world is in the opposite axis of the Z, so it is 0, 0, -1. And the Z axis of the world is in the opposite direction than the Y axis of the camera so it will be 0, -1, 0. So this is the interpretation we have shown with columns as their rotation of the axis with respect to the camera. Let's see how we can establish the second interpretation of a second interpretation of the same motions. The first one was a translation and these were pretty much moving. The two coordinate systems one, onto the other. So I can do it by for example, by moving the calibration part and just align them almost like this. I don't have space to this, but imagine that I have done this as a first step. Then what I can do to align the Z axis is to rotate around the X axis of the camera by minus 90 degrees which will be, this way. And then the only thing I have to do, is to rotate around the Z axis, about 180 degrees which will be, this way. So this way, the two coordinate systems are perfectly aligned. And I have written the information as a computation of three threshold measures.