This video is about a C string. In this video, we will deal with a quick refresh about what is the C string and which function we generally use to manipulate the C string. After, we'll go in more detail about the problem with this function. Then show how this problem can cause a security issue. Then we'll finally take a look to the output formatting in C and also the input scanning in C. What is a C string? A C string is simply an array of char, so nothing else in that. The end of the string is indicated by a character with a value of zero that we call the end marker or the end of string marker, generally. When we manipulate string in C, we use a function that have been defined, created especially push for that. They are defined as stdio.h and the string.h eight or five. Function for manipulating string. The source code provide with this code contain a example called string that show you, as you remember how to declare a string constant and variable and how to use the C library to manipulate string. When using this example on Windows, we have to define some constant just to avoid the Visual Studio compiler to output many warning about unsafe C API. In the next video, we'll talk about the security and then some version of this CRT function. Because I've provided and help you avoid some problem that we look at in this video. Well just let take a look to the string example right now. There is the code of the string example. First, you see the define of the underscores here. The underscore seeker; underscore, no underscore warning just to make Visual Studio or warning about unsecure API. After that, we include the two into eight or five that contain the declaration of the function we use to manipulate a string. I added some comment also. We'll try to show all the code, so if you need more time to read them again, just pause video it will be easier for you. I want to process quickly enough, just because I know that a lot of people looking at this course already program in C or know how to use the string manipulation function. Here, we define a string constant. First, using define and the second way using a constant variable. We also declare our variable and then we inscape the maximum length we want for this variable. This is a global variable. In the example, we declare our local variable, and we after use some of the string manipulation function. Here we have the fgets string that make it possible to read a string from a five or from the standard input. Put string do the reverse by displaying a string under standard output. Stringlen to retrieve the length of a string. String compare to compare two strings. String copy to copy one string into another one. You have the strncpy that will copy, but with limited length. It's a better way to copy strings because they avoid some possible overflow. When using that, we need to know that the resulting string can be null-terminated, so not having the NF string mark in it. I just present here a better way to use it, in order to always make sure that the string is correctly terminated. Last one, a strcat to concatenate two string. The C standard library provide more function that what we see in the string example, I just make a small table here to list some of the function, you still have more than that. I just indicated in green, some function that take the size of the string in arguments, so they are able to have wide buffer overflow by themself. I also indicated in red two function that is very dangerous and that must be used with care. The problems with this function. Many of the function we saw in the last slides have a major security flaw. Many manipulate string without knowing the maximum size of the string and many assume the string always end with end of string marker, has a [inaudible]. If the string is not correctly terminated, they can get past the end of the string and do a long work counting all byte that are not zero in the memory or the computers. The string_0 example show how not to use the common string manipulation function. Here is a code for the string_0 example. We first include the need editor. This example take two argument on command line. The first one being the username and the second one being the password. The main function, verify the number of the argument. After that, base on the username, we retrieve the password using the retrieve password function. We will see that the retrieve password function is a little bit dumb and always return the same password, but you understand that it show, in a real programmer if you've password from a file or database. After that, they copy password in their local variable and compare the expected password and the received password in order to know if the write is granted. If yes, shall we print hello with the name of the user, but if no, they print a error message. Retrieve a password, always return. Apple as password. If we want to try this program, it would be easier to do it using the command line just because we need to pass argument to the program. I'll change a folder for Debug 1 and I can execute so I can give username and password. If I give a wrong password, I get an error. But if some hacker look to the code, he can see that, when we copied the password, we don't take into account the length of the password user specified so it can be possible to cause a buffer overflow. By writing a password is as longer than the length of the lPassword variable. We can try it. We are in Debug, most probably the memory card will avoid the problem, avoid that. If we try it, we'll write a longer password. Yes. The problem had been detected because the memory card had been corrupted. If we [inaudible] the memory card is just check when the function return. It's not really a program that could detect the buffer overflow for the hacker, just because he will have time to overwrite some other local variable and maybe find a way to get the password wrongly verified as correct. Just to make thing simpler for the hacker, we'll just switch to the release version compiler program. Just check what I'm working correctly. Yes. This time the buffer flow is not detected, but we still have incorrect password message just because the password is not matching. But, as you see, the password we expected weren't the same. If we can overwrite the expected value when we write in password, we could make a situation where there's compare. Does not compare the expected password but compare the password. We just take enter l value the hacker successfully put in the lExpected variable. We're trying. I did an error. What I want is two passwords with eight letters each. Yes, the program detect the overflow, but before detecting it, they match the password verification and let the hacker do what they have to do. If detected just because the \.markerm here happened to corrupt the memory card that is between local memory and the return address. This is the code, for example. Again, we use data just because we don't want to use the NN CERT security function provided by Microsoft for now, it's the subject of the next video. The program take one argument that is a full line and the program simulate the program that will encrypt this file using a secret key. The user is not expected to take as [inaudible]. The program first verified in the line argument. Retrieve the secret key. [inaudible]. We'll see in few second that the function just initialized a variable and then we returned the address of this silicated memory here. Then this a memory thing called the secret key. The program open the file. If it cannot open the file, it'll will just indicate a error message, and will display the name of the file, so the user will be able to see what he write wrong in the file name. If he successfully opened the file, it will read it by block, encrypt it and dump it on the file. The function retrieved the key. Just create a buffer to contain the key. This is a very important secret key and the on Create function do absolutely nothing in this dumb symbol. Does someone here see a bug, not a bug but a security issue with the scope? Because we are in the output formatting section of this video. No surprise that a security issue is about output formatting, and the problem here. The code display a string that have been entered by the user. The user has a format in there, and the printf function. If the user put some formatting, coming in the filename, we'll be able to display variable that are on the stack. Must probably will be able to display the secret key error. Just try it. We'll compile it. Just remark that the compiler did not display a warning compilation, warning about the issue. Go back to the Debug folder. I will give the name of a files, which is the [inaudible] repository. Yes, it just read the readme and I'll put it. If redder than giving a frail filing I use a push on S. We see that the program can open the file, appear not a valid filename. I'll put something that he take from the stack. Again, go away, go folder and inscape to this percent S. I'm able to see little bit [inaudible]. My last percent S create problems, just because it was not an address that can be displayed as a string. We'll now switch to the release version just because I want to avoid to see the CC installized local variable. Just switched to the release. Make sure to program is correctly build. Do the same variable, the same switch. Yes, we have some other noise here, but we successfully with the third S, display the secret key. Here the programmer did few error. First is to use jerky string provide by the user as a format for one function of the printf family. It's clearly an error and shall not be done. Then in the Linux, this will cause a completion warning and skipping that format for the [inaudible] should be a static string. There's my advice for the output formatting. Always use constant string as format for any function of the printf family. Never pass a string retrieved from user input file or a request as a format to printf. If you absolutely need to do that, verify the string and scan for the percent and verify that the number is a correct number and the letter is the expected letter for the format. Input scanning. We all know and some hate, one of them. The input scanning function in the scan f family. We already saw many of them may cause buffer overflow, just because the string, the variable we use to receive data is not long enough for the data we can receive. Do you think this function represent other more specific risk? We'll look to the scan underscore zero example to see. First I could include the [inaudible] file. The small program will take one argument, this argument would be to find the program read, come in and switch in from the file and tried to execute it. We'll see it few moments. Here we open the file, and we call the function to process a file and after that, that'll suffice. If we look at the function to process the file, the file can be composed of many commands, so we have a loop. Until we get an error, we execute the command from the file. The first thing we read from the file is a command and an address. After that, we read the filename. The command is get to our post, and we can get or post the data from an address and put it in a file or get it from the file and post it to the address. We have function to validate address and validate filename. What is really interesting. We don't want to let the program access any address or access any file on the filesystem. Based on the command name, the command name is END because we've reached the end of the file, just do a break to get out from the loop. If the command is get or post, we call function get or post function that do absolutely nothing. Deletion function do nothing and they get and post for this simple, do absolutely nothing except print message that then scale that. We run a get command or run a post command. I create some input file here, to do some operation will just compile it, I mean debug and the filename. Put zero with a txt here. Yes there a program, run the program, run the get, run the post, all just fine. Just like now imagine another filename, this file containing command, but do not end with the dam. What we would expect to happen is simply that you run the first command after that, it will just start because they have no more command, just look, if it's what happened, one, it run forever. Just look what, why this happen? We see that the return value of the fscan is not scanned, so if this flame file loop, it will use the same command that before and go through the same path. Where it's what happened. This the first time our work fine. Go back here, at the table the loop, the variable are nothing initialized so they are still contain valid information; so a valid address, valid command and a valid file. Here the fscan failed, because the file is completely ready, but the address and the commands still there. It can run again and again. This is clearly a problem. It's a clearly a security issue because this wave, someone could completely scowl the complex system, because this part of the system will execute the same command. It's a point of attack for [inaudible] servers attack. Let's file a [inaudible] I did a very long filename so we'll just look at what happen. We get an error message that say that the command is not valid, just because the command is now ong.xml. We clearly have a buffer overflow in the code here. When we scan for the address, the address was too long and some part of what was expected to go in the address, local variable, happen to go in the command variable. We could have some issue and it's clearly not adapting. A, we have a programmer that did many error. First one, you should initialize local variable, as we see in the video one local variable. It should also verify the return value I've scanned to know if the function really scan correctly. As expected. With time I developed a way to avoid problem with scanf. First step, I do not use scanf by itself. I always use fgets string before to read input from the user, and after I use sscanf to parse it. Just a little look, let's take a look to the scan_0_a example to see how I use my new technique. This is the improved version of the process file function for the L, you'll remarked death. I added a local variable. This is a line, so a will read one line at a time using the fgets string as I explain in the slide. I also increase the size of the L command local variable to be the same size as the line. First thing I read the line, if the read fail, I just report the error. After that, I use the sscanf f function to extract in the command and address from the line. Very important, I take into account the return value. If the return value is one, it's maybe valid because if the function is n, no address are needed. So it's okay. I verify that down. If it's not one, I will go to air and I will refer, it's not to clearly have an error. Then the L is coming from it. Note that here, the scan is no more able to cause buffer overflow just because the L coming and then the L address stream are large enough to contain how width the line can contain. I already know that the L gets stringed. Use a size of the line and will truncate the line if the line is too long, so it will not cause any problem. Going part, I read the second line which we did to find length. I extract to filename from the line, the same way I did before. Here the step can maybe look a little bit overflow, but I have to do it just because the fgets string return a string with the carriage return had Dan art and newline character. This line, remove it from the line input called a valid data in the filename. Again, no buffer overflow can occur just because the line and the filename writeable are the same size. I validate address and the filename, remarked death. I did it completely at the end rather than in the previous version where that reservoir was for the Deaton. Also, as soon as we trimmed from the file. Here I put the twofold deletion function call a dam, just to be sure that even if the part here could cause an overflow and overwrite and address or recommend our filename in some way, I will validate what I've in the variable just before executing the command. It's better than validating it before and letting some old or code be able to corrupt the content of the address or filename in some case. For that the function is exactly the same as before. We'll just try it. I created with the first simple file, we find. Take the first time element detect that the defied wasn't formatted correctly. The last one, we saw that it do we execute the command. But using a filename that had been truncated. After that the rest of the command goes up format, become in format error because the VR, they're not thought of it coming. It's the end of this video. The next video will be about the security-enhanced version of CRT function. Microsoft provide with Visual Studio that help to avoid some problem we see in this video.