Data scrubbing is the idea of being able to take data and create a subset of it, if you will. But without all the unique identifiers or at least many of them being removed. We often will do this when we replicate data from reduction to a test environment. Development has to occur in order for us to figure out what we need to do to offer goods and services. And we usually have to use live data that will approximate what the system can do in order to test the functionality of the system. And so, we usually we'll take production data, dump it into a test database, and then tell the developers to run their test there. The problem is production data is live, it has uniquely identifying characteristics associated with it for our customers. What we would call PII, Personally Identifiable Information, P-I-I. And we have to take care with that because we have the concerns about confidentiality that we have to address with safeguarding that data. And privacy concerns, and perhaps even regulatory concerns about the management of that data. If we are in the United States, for instance, we have to deal with potentially HIIPA, Sarbanes-Oxley, or perhaps FERPA in the education space. These are all concerns that we would have to be aware of as we look to move data from production into test. As a result of that, what we often do is we scrub the data, because when we move the data over, we don't tend to move all of the security controls over with it that will protect and safeguard the data. We don't move over the constrained user interfaces, and the menus that are tailored to remove access to certain features of the data. We don't tend to move over the permissions in the same control elements that are there. So instead, because now the data in theory is fully exposed, fully potentially compromisable. What we have to do is we have to come up with a way to limit access to the information that's there that could be potentially problematic for us. User identities, social security numbers, account numbers, things of that nature. You can imagine what that list may look like, depending on the kind of data we have. So what we do is we go through and effectively we sanitize, or scrub the data. We get rid of a lot of that PII, that Personally Identifiable Information, leaving behind the actual data sets that the developers would need to work on, minus the information that uniquely identifies customers or employees or patients, or whatever it may be, whatever we're talking about. So data scrubbing is this idea of being able to clean up the data, removing unique identifying characteristics. But at the same time keeping the integrity of the thought process behind the data set itself intact, so we can test it. This is a very important thought process for us to engage in. Data deduplication, the idea of getting rid of sameness. If I have ten items and they all are identical, but they are all being stored individually in ten separate folders on a system, I'm taking up nine times as much space as I need. Why not create one master copy of that and just have nine pointers in those other folders that point back to the original document when somebody needs it. This will be called data deduplication and also it's referred to single instance storage in message systems. But generically, the idea behind single instance storage is data deduplication. If we have a lot of stuff that's the same and we can reduce that down, we reduce the volume of data and reduce the volume of storage necessary to deal with that data. But we also reduce the complexity of securing the data because we now don't have ten copies of the same thing. Requiring ten sets of permissions that are identical, ten individual processes to manage the application of those permissions and ten monitoring processes to monitor that they're being done correctly. Instead, we have one master copy, one set of permissions, one application processor, one access control process to manage that and one monitoring process to keep an eye on it. And then all we do is replicate that time and time and time again whenever somebody needs to access it. So data deduplication is just the idea of getting rid of sameness. And we do this to reduce the amount of storage, to reduce the footprint. We also, as I said it, do it because it becomes a lot easier for us to manage and secure that data when there's only one copy instead of 10 or 12 or 100 or whatever it may be. When we think about managing encryption keys, we also have to think about, we already talked about, the fact that the keys can become compromised. They're sensitive, they have to be stored securely, they have to be used securely, we have to make sure they don't become corrupt or lost and this is part of what we would call key management. So key management refers to all the systems, the procedures and processes, all the things we do. As we say here, securely generate, store, distribute, use, archive, revoke, and delete keys. That that’s a mouthful, right? Securely generate, store, distribute, use, archive, revoke, and delete. Seven individual things we may have to do under key management overall in order to deal with the key life cycle. This is all about a life cycle. Key management's all about the life cycle of keys, birth to death, right? When we create a key, when we distribute a key, when we assign it to a user, when we securely store it, when we securely use it, and ultimately when we revoke it or end of life it. That's the whole process, we go through. And we have to think about the policies, the roles, the procedures all the things that have to happen for that to be done securely. One of the places where we tend to really either get it right and everything is good, confidentiality is assured availability is assured. The two are lock stepped, they're working together and everything is good, is right here. Or where we get it horrible wrong and we separate confidentiality and availability without meaning to and we really just don't have a clue what's happening and why, is right here. Because if we don't get key management working and we don't understand how to do it properly, we're going to screw things up, it's only a matter of time. Somebody's going to lose a key, somebody's going to expose it, somebody's going to forget to store it securely and somebody else is going to come along and take it away from us. And get our data, breaching confidentiality and effectively screwing up availability because they're going to remove the data or modify it in some way and touch integrity. However that happens, that's bad, right? We call those kinds of activities that go horribly wrong, incidents. And ultimately, we have to investigate them and manage them and identify them and ultimately walk through a process of addressing them. We're going to talk about those in one of our later conversations with regards to incident management. And we also refer to those things as risks. And we're going to talk all about risk in one of our later conversations as well. We don't want to have a lot of risks. We don't want to have a lot of incidents. We want to have a lot of systems that are working relatively well and doing what they need to do with minimal impact to our security posture. This is what, the goal of good security and good planning is all about. And as an SSCP, this is something you need to be focused on and really be thinking about. This is what you need to do and you need to be focused on. Getting things like encryption key management correct is part of that process. It's not the only thing, but it's a very important component of that overall thought process.