This is the login program design, and what we're going to do here is look at how HDM, which is a formal methodology, works and apply that informally to login. So, let's take a look first at HDM. It's a hierarchical decomposition methodology, meaning you start big and then you break down into smaller pieces, essentially a form of top-down programming. It was developed at SRI International for the provably secure operating system, PSOS. The design of PSOS was formally verified, and in fact, there's a very thick technical report that you can find on the web that covers the design and validation of PSOS. Unfortunately, for various reasons the sponsor never funded the implementation, so it's still simply a design. Now, the way HDM works is first of all, you define the interface. This is essentially the system requirements. How will people use it? How will the system react to certain things? And so forth. You then carry out a hierarchical decomposition or top-down decomposition, if you like. As you break it down into modules, for each module, you develop a formal specification. That way, you can verify consistency across the modules so that you will always have the case of, if module A calls module B or depends upon the results of module A, the assumptions module A makes about module B will be correct, and vice versa. Then you continue refining the layering. You break it down, break each module down further and further, and it's done in terms of interfaces. You define interfaces, and so that way you can change the insides of the modules as long as the interfaces are the same, as long as the interfaces don't change, because changing the interface affects other modules. Changing the internals but not the interface doesn't. Then you implement the system. So, that's how HDM works. This brings up a couple of lessons. The first one is top-down programming, otherwise known more formally as stepwise refinement, is good. You break the task into parts, break the parts into parts, break the parts into parts, into more parts and so forth. Ultimately, each module should carry out exactly one task. This is high cohesion. This is the KISS principle or economy of mechanism that we referred to earlier. For each module, the inputs and the outputs, you know the assumptions that it's going to make. This leads to a layering process. By having layers of abstraction of your program, you can validate that in fact, the modules that are built on top of one another have inputs and outputs they reflect consistently. It simplifies checking, it simplifies debugging, and you can change the internals of the routines without affecting anything at a higher or lower layer because the only thing those layers see are the interfaces. Okay. So, how does this apply to the login program? Well, login program, as we said earlier, has four goals. So, let's take these sequentially. At the highest level, we're going to authenticate the user, change the group user and group information from the system administrator to correspond to the user, update the log files, and then start up a command interpreter that the user can use. So, let's break these down in the first step. To authenticate the user, we need to get the user's authenticator, the information that the user will supply to authenticate themselves, and then get the corresponding authentication data. We then transform these into whatever form is appropriate for comparison to validate that the user's authenticator matches the user's authentication data and it is the correct user. So, that's three steps. Get the authenticator from the user, get the authentication data that the system has associated with the user, and compare, check for a match. We then get the UID in the group information. We need to change that to the user's UID. So, we get the user's identification, the group and all the groups that they're in, and then we're going to change the process so that its identification is no longer the root's, and its groups are no longer those of root or of login, but now they are the user's UID in groups. Updating the log file is relatively simple. Open the log file, write the data to it, close the log file. Three steps. Then we have to give the user the command interpreter that will give them an interface to the system. On Windows, for example or Mac's, Mac, this would be the Finder; on Windows this would be the Windows Explorer, the user interface. On Linux and Unix systems, this is the shell. So, first, we obtain the shell or the command interpreter. I'm going to use the term shell for all three of those. We obtain the specific shell that the user will use. We verify that in fact it's a valid shell. Then we go ahead and start it up. So, now we've done some stepwise refinement. Now, let's look at how we might instantiate this. Well, it's going to depend a lot on libraries. For example, libraries to read input, libraries to pull authentication data, and so forth. So, we're going to build it on the standard libraries, I/O and C libraries. We have to look for their assumptions. For example, copying a string using strcpy assumes that the second buffer is no longer than the first, so it'll fit completely in there. We use strncpy so we can actually check that. The problem is if the data being copied from the second buffer into the first is exactly as long as the first, there'll be no NUL byte added. If we get input from the user and we use the gets function, it doesn't do any bounce checking; F gets does, and so forth. So, the library functions themselves will have assumptions. So, we need to know what they're going to output in weird cases or prevent the weird cases from happening. For example, if I say, give me negative 2048 bytes of space malloc, how much storage space will I actually get? What happens actually is that the minus 2048 is interpreted as an unsigned number. So, I'll get two to the 64th minus 2048 bytes of space, or two to the 32nd minus 2048 bytes of space, depending on my word size. What happens if strncpy is called with a negative number? So, copy minus five bytes from b to a. Well, again, that list argument is unsigned, and so it's the same as two to the 64th minus five bytes from b to a, or two to the 32nd minus five bytes from b to a. If you're lucky, b will end in the NUL byte and there won't be any damage. If you're not, you're going to be messed up. All right. The next element of the checklist is structure. Is your program appropriately structured? One of the truisms of security and secure programming is, put as all your security-related relevant elements into one place and the non-security relevant elements into another place. The reason for this division is that way, we know where the security is, and we know when something goes wrong it's in that area. It's not scattered throughout the program. This by the way, mimics the old notion of a trusted computing base or TCB, where all security-relevant functions were within that TCB. Each function, each element performs one task, high cohesion, again. This ties in also with the KISS principle or the economy of mechanism, keep it simple. If each module does one thing, each element does one thing, they'll be much simpler than when you have one element doing many things, or you have the elements bound in with the non-security relevant parts. Then the last one is check my interfaces. Checklist item number four. We talked earlier about passing pointers. Don't do it unless you cannot avoid it. Typically, with library functions in C or C++, you can't. But if you can, do it. Is the calling structure, is the way your program is put together easy to follow? If it's very convoluted, that may lead to problems as well. Try to make it as simple as you can and as easy to follow, and you check everything. You never know what can go wrong.