[MUSIC] Today, we'll start with technical lectures on perception. The first object of our study is a camera. What is a camera, like the phone camera? What is a camera, like the human eye? What is a camera on an animal? And let's go back to nature. We have studied the quadrotors in aerial robotics. We have studied other animals in locomotion. And here, I want to show you an extreme example of an amazing perception probability. These birds are called gannets and they are famous of diving with extreme speeds into the water without crashing their wings. How can they do that? If you observe the video, you will see that they have to kept their wings open in order to have a stable trajectory. But at some point few yards before the water, they have to close them in order to avoids being crashed. How do they estimate this distance? Do they have a laser? Do they have a GPS or they have just eyes? So this is how the head of a diving gannet looks like and indeed, in a very pioneering article by David Lee, the article called plummeting gannets. A paradigm of ecological optics. It was the first theory how a bird like the gannet with two eyes, which are not even looking forward, they're looking laterally. How these birds can estimate distance from the water? And is a matter of what they found estimate of the time to collision of distance. Let's see how those eyes work and we will start seeing all possible kinds of cameras. And at the end after a few lectures, you are going to prove that as a matter of fact, just from the visual optical flow field, you can estimate this time to collision. This is an example of a camera on a quadrotor. This is an example of a camera called a Ladybug in two versions of the black and the red one. Which is an array of cameras capturing panoramic views and producing panoramic videos. This an example of the Kinect. The Kinect is a camera that you have seen on the Xbox 360 and this is really capturing a depth [INAUDIBLE] and can estimate the skeleton of your body. This is something called the laser scanner, a Hokuyo, which is just a very high accurately version of a radar and such versions exist in today's driverless cars. This another example of a stereo camera both from the left and on the right. And the stereos cameras are probably the most similar cameras to the human telescopic visual system So what is a camera? A camera has two basic elements. One element is the imaging chip, usually these days it is what we call a CMOS chip, and then on top of that we put a lens. And we will start with the description of the lens because it is very important how the image is formed on the CCD chip. So what else have you seen lenses? You have seen lenses in magnifying glasses. When you were young you might have used magnifying glasses in order to burn a paper. This is because of this unique property of the lenses that they can get all the raise from the sun. And concentrate at every one single point and create that a very high temperature that facilitates this burning. We don't want to burn the CCD chip we want that to get a collection of these rays in order to get an image. How does this work geometrically? Imagine that we have an object which is just like a cylinder or some human which is just an abstraction here. It is an orange line segment. And this orange line segment image arrays and is projected on the image, the very small orange lens segment which is on the image plane. So between the object on the left in the yellow oval, and the yellow oval on the right which is on the image plane in the middle, we have a lens. This is a model of what we call a thin lens. What does this thin lens mean regarding the geometry? The thin lens has the following properties. All the rays that are going parallel to the axis of the lens, this is the big horizontal axis called the optical axis. When they go through the lens, all of them go through a point, called the focus of the lens. After they go through the focus of the lens, they can hit an image plane. The array, which is going directly from a point in the world through the center of the lens. The second yellow ray is going directly through without being refracted at all. When these two rays intersect at the image plane, we have the perfect image of a point. This is the sharpest image of a point. This is also an algebraic characterization. If a is the distance from the object to the lens. B is the distance from the image plane to the lens. and f is the focal length which is where the focus is in respect to the lens. Then we get this very sharp image which is the intersection of this two incoming rays. When the two rays that have passed through the lens intersect at one point. We can prove very easily thus through the similarity of triangles that 1 over f is equal 1 over a plus 1 over b. Where again f is our focal length, this is an intrinsic property of a lens when you go to a shop and ask for a lens. You really say, I want the lens for example with 50 millimeters focal lengths, a is the distance of the object. This is something that you don't have control of. That is depending on where your object is and bridge the distance of the image plane through in which you get the image of the object. This is what you can actually control. So let us see what controlling the image plane. In practice, controlling the image plane is called focusing and usually you achieve this by moving a ring, by turning a ring which you see on a lens when you have your lens on a manual mode. In modern cameras this focus is done automatically by some control system called autofocusing. But what really happens there is that if you move the image plane. You will see that instead of getting a point as a projection of a point in the world, you get the small red line segment. This is exactly the blur you see on many points. You only have a blurred picture, which means a picture that is not in focus. In this case, you can very easily prove that the length equation is not valid 1 over f is not equal anymore to 1 over a plus 1 over b. Let us see now how the size of the image is affected by the distance of the object from the lens and the image plane. If you look at the length of the orange segment on the image plane you will see that you can directly relate it to similarity of triangles to the height of the object. And you can get that the big Y, which is the height of the object, can be the height of a human divided by a, the distance from the image circle from the lens. Is equal to the size of the object in the image, small y divided by the distance of the image playing from the lens. So this is a very simple similarity of triangles, but it tells us a lot. It tells us that if the object is getting away from the length that the image will become smaller, which means if a is increasing then Y is much smaller. This is what will happen when somebody moves away from the camera. What happens when b is changing? Changing b really means that I will change the distance of the image plane from the lens. We see this moving when this happen the distance of the object in the world changes relative to where we start. And you see a very particular property that when the point is moving along the array and just a way then the prospective projection is not changing. It's only projected in the same point, which means that for a specific point like the motion here, we can not disambiguate from one point where it really lies on the seam.