So we've learned a lot of little bits and pieces of representation, that are used to put together a graphical model. And now let's try and take a big step back, and figure out how you might put these all together if you actually wanted to build a graphical model for some application that you care about. Now, let me start by saying that this is really not a science. Just like any other design, it's much closer to an art, or even a black magic, than a scientific endeavor. And, so the only thing that one can do here is to provide hints about how one might go about doing this. so let's first identify some important distinctions, and then we'll get concrete about particular examples. there's at least, three main classes of design choices that one needs to make. the first is whether you have a template based model versus a very specific model for concrete fixed set of random variables. Whether the model is directed or undirected. And whether it's generative, versus descriminative. These are all terms that we've seen before, and we'll talk about them in just a moment. But, before we go into the sort of, trade offs between each of these, let me emphasize this last point, which is probably the most critical thing to remember. It's often not the case, that you just go in one direction or the other. That is in many models you're going to have for example, template based pieces, as well as some stuff that, isn't at the template level. you might have directed as well as undirected components, and so on. So these are not, a sharp boundary, and it's useful to keep that in mind. That you don't have to go only one direction versus the other, in a real problem. Now the first important distinction is template based versus specific. And what are some examples of specific models? So for example medical diagnosis is usually a specific model. That is you have a particular set of symptoms diseases and so on that you want to encode in your model so that's one example. on the other side on the template based side you have things like image segmentation. Where you really are going to want to deal with drastically different images within the same model, And that's going to be [INAUDIBLE]. . There are all sorts of applications that sit in between those two. And, it can go either way, or incorporate elements of both. So, for example, fault diagnosis. All diagnosis has you can think of it as a specific model that is you can think about writing a diagnostic model for this particular type of printer. But really, if you're in, inside a company that's writing a diagnostic tool for your line of fifteen different printers, they're going to have shared components. And if you have a component inside printer one that also appears inside printer two, chances are that it's going to have the same fault model. And so you're going to have elements that are unique, and elements that are shared. And so once again, it's something that's going to sit at the intersection between the two. That, said. Once you've decided where on this spectrum you sit. It kind of really changes the way in which you tackle the knowledge engineering problem. Because template based models, are usually. Not always, but usually. have a fairly, small. Number of variable types. So, for example, in our image segmentation setting, you have the class label, that is one variable type. Nevertheless, we manage to construct very richly expressive models about this, because of interesting interactions between multiple class labels for adjacent, for different pixels in the image. But it's a very small number of variable types, and most of the effort goes into figuring out things like which features are most predictive. So it's really about feature engineering. As opposed to, sort of, complicated model design, not always, I mean certainly not uni, entirely but, the features turn out to play a very big role. On the spcific model side you have, usually a large number because unless you build small models, each variables going to be unique, so a large number, of unique variables. Each of which requires it's own model. And so it's a lot more on the model design deciding on the dependency and the parameters for each variable separately. a second important distinction is between generative and discriminative. So on the discriminatave side you really should consider that when you have a particular task in mind. A particular prediction task. And when that prediction task is then often better solved by having richly expressive features, richly discriminative features, and then modeling this as a discriminative model allows me to avoid, avoid dealing with correlations. Between them. And so that gives me usually a high performance, a higher performance model. So you might wonder, well when would I use a generative model, I mean that he gets at high performance by using richly expressive features, and there's multiple answers to that. One answer is when I don't have a, when I don't have a predetermined task so when the task shifts. So for example, when I have a medical diagnosis pack, every patients present, every patient presents differently. In each patients case I have a different subset of things that I happen to know about that patient. The symptoms that they present with, and the tests that I happened to perform. And so, I don't want to train a discriminative model that, uses a predetermined set of variables as inputs and a, predetermined set of diseases as outputs. Rather, I want something that gives me flexibility to measure different variables and predict others. The second reason for using a generative model. And this is looking way forward in the class. is that it turns out that generative models. Are easier. A train in certain regimes. And specifically, just to sort of make sure, just to sort of say it out loud, in, the case where the data is not, fully labeled, it's, it turns out that generative models can some, that, that, sometimes you can't train in this form of the model, but you can train a generative model. So we'll definitely see that when we get to that part of the course. Okay, so having talked about these different these different regimes. Now let's think about what are the key decisions that we have to make in the context of designing a graphical model. So, first of all what variables are we going to include in the model and regardless of whether we have a fixed. Or varying task in hand. We have usually a set of variables that are the target variables. These are the ones we care about. So, even in the medical diagnosis setting, you have a set of disease variables. Which are the ones that we care to predict. You might not care to predict all of them, in any given setting. But they're usually the targets. We have the set of observed variables. Again, they might not always be observed, but you don't really, necessarily care about predicting them. So these might be in the medical setting, things like symptoms and test results. And then, the third category might be a little bit surprising. So, we might have variables that are latent or hidden. And these are variables, that, we, don't. Nor do we necessarily care about predicting, they are just there. Why would the [INAUDIBLE] model variables that you neither observe nor care to ever look at? So, let's look at an example. Let's consider. Imagine that I asked all of you in this class, what time does your watch show? Okay? So each of these WIs is the watch. The, the time on the watch of each of you in the class. So we have W1 up to WK. Now, these variables are all correlated with each other. But really, they're not correlated with each other. Unless we all had, like, a watch setting party just before class. Really, what they're all correlated with is Greenwich mean time. So you have a model, in this case it's a naive base model, where you have Greenwich Mean Time influencing a bunch of random variables that are conditionally independent given that. Now Greenwich Mean Time is latent unless we actually end up calling Greenwich to find out what the current time is right now in Greenwich, which I don't think any of us really care about. But why would we want to include Greenwich Mean Time in our model? Because if we don't include Greenwich Mean Time, so if we basically eliminate Greenwich Mean Time from our model, what happens to the dependency structure, of our model? We end up with a model that is [INAUDIBLE]. And so sometimes latent variables can simplify our structure. And so there useful to include even in cases where we, real, don't really car about them, just because not including them gives us much more complicated models. Which brings us to the topic of structure. when we think about Bayesian networks specifically. The, the concept that comes to mind. The question that comes to mind is, do the arrows, given that they are directed. Do they correspond to causality? That is, is an arrow from x to y indicative of having a causal connection from x to y? So, the answer to that is yes and no. Very satisfactory. so what does no mean in this case? Well, we've, we've seen. It means consider a model where we have X pointing to Y. We'll just, you know, do the two variable case. Well, any distribution that I can model, on this graphic model where X is a parent of Y, I can equally well model in a model in the Bayes Net where I invert that edge and has a Y pointing to X. So, in this example, as well as in many others, I can reverse the edges and have a model that's equally expressive. And in fact I can do this in general that is you can give me any ordering that you want on the random variables and I can build you a graphical model that can represent them. Any distribution that has that ordering on the variables so you want X1 to come before X2 to come before X3 and you want to represent the distribution peak, that's fine no problem I can have a graphical model that will do that but. That model might be very nasty. And we've already seen an example of that when we had a case where X1 and X2 were both parents of Y, and it was, you know, a simple model that looked like this. And if I want to invert the directionality of the edges and put Y as a parent of say X2. Then, I have to, if I want to capture the distribution that I started out with that for which this was the graph. Then I end up having to have, a, a direct edge between X1 and X2. And so what happens is that causal directionality is often simpler. So to drive this home even further, let's go back to our Greenwich mean time example. Where we have the Greenwich mean time is in some way the, the cause or the parent of the different watch, times that we see our in different individuals. And let's imagine that I force you, to invert the edges. What's it going to look like? Well. And now I'm going to force Grenwich mean time to be the child of all these. And now what? Is this the correct model? No, because this says that all of the watch times are independent which we know is not the case. And so, what we're going to end up with as the model is the same horrific model that I showed before where everything is connected to everything else. And so causal ordering, although it's not more correct than a non-causal ordering, it's sparser. So generally. Are sur as well as more intuitive so more intuitive. As well as easier to parameterize. Very human. So again your not forced to use it and sometimes there is a good reasons not to do it but. It's generally a good tip to follow. So how does one actually construct a graphical model? Do we have in our minds some monolithic P of some set of variables, X1 up to XN and we just need to figure out how to encode that using a graph? Well maybe implicitly, but certainly not in any explicit form. The way in which one typically constructs a graphical model in practice is by having some variable or sometimes set of variables that we wish to reason about. So, for example, we might care about the variable cancer or maybe even lung cancer. Well, what influences, whether we have cancer. whether somebody is going to get lung cancer. Well if we go an ask a doctor. What is the probability for someone to get lung cancer? The doctor is going to say, well. You know, that depends. And you might say, what does it depend on? An the doctor will say, well. Whether they smoke for example. At which point. You're likely to add the variable smoking as a parent to the lung cancer variable. The doctor might say well but that's not the only thing, it might the probability of cancer also depends for example on the kind of work that you do because some kinds of work involve more dust particles getting into your lungs and so again here's another variable which you would add as a parent. And I even go and ask either there a doctor or an expert in a different domain what is the probability that somebody smokes? And if they think about it they're likely to say that depends, and what does it depend on? Well maybe their age, gender, maybe their, the country that they live in because certain different countries have different smoking frequencies. And so once again, we're going to extend the conversation backward to include more variables up to the point that we can stop, because if you now ask for example, what is the probability of gender being male versus female, well anybody can answer that one. And at that point one can stop because there's no way to extend the conversation backward. Is that enough? Usually not because we also need to consider for example, factors that might help us might indicate to us whether somebody's going to have can, somebody has cancer or not. And so we might go and ask the doctor what are some pieces of evidence that might be indicative here, and we would, the doctor would tell us for example, coughing or maybe bloody sputum and various other things that would be potential indicators. And at that point, one would say, well, okay. What is the probability of coughing given lung cancer? And again, one would now extend the conversation backward to say. Well, other things may cause coughing. For example, having allergies. And so once again we would, go from here and extend backward, to construct a graphical model that captured, all the relavent factors for answering queries that we hear about. So, that's the structure of a graphical model now let's talk a little bit about parameters, the values of these parameters and what make a difference here, so here are certain things that really do make a difference, to parameters, zeros. Make big difference. And when we talked about diagnosis we saw that many of the mistakes that were made in early medical expert systems were derived from the fact that people gave zeros to things that was unlikely. But not actually impossible. And so zeros are something to be very, very careful about. Because you should only use something, you should only give probability zero to something that is, impossible perhaps because it's definitional. Otherwise, things really shouldn't have probability zero. Other things that make a difference are a sort of weaker versions. So for example, orders of, order of magnitude differences, the difference between a probability of one over ten versus one over 100 that makes a difference. It makes a much bigger, whereas small differences like 0.54 versus 0.57 are unlikely to make a difference to most queries. Finally it's turned out that relative values between conditional probabilities make a much bigger difference to the answer than the absolute probabilities. That is, the, comparing different entries in the same CPD, relative to each other, is a very useful way of of evaluating the graphical model and seeing whether the value. Use that you use for those relative ratios really make sense. Finally, Conditional probability tables are actually quite rare acceptance small applications. In most cases one would use structured CPDs of the forms that we've discussed as well as the variety of other forms. So let's talk a little bit about structured CPDs because those are actually quite important. and we can break up of the. The types of CPD's that we've talked about along two axes: one is whether they're intended to deal primarily discreet or with continuous variables. And on the other side is whether they type of structure that they encode is context specific, where a variable might make a difference in some circumstances and not in others, versus aggregating. Of multiple weak influences. And so let's give off an example of each of these categories. So for discrete and context specific, we had three cpd's as an example. For discrete and aggregating we had sigmoid. CPD's as well as noisy or, where noisy max or any one of those, that family. For continuous CPD's we didn't actually talk about context specific, representations, but one can take the, continues version of tree CPD called a regression tree. Where one breaks up the context based on some threshold on the continuous variables. And that is a context specific version of the a continuous c CPD. And finally, the aggregating version of a continued CPD the the Gaussian or conditional linear Gaussian is a is an example of that. By the way, note that the conditional linear Gaussian, that we talked about is content specific on the discrete variables. Finally, it's important to realize that a model is rarely done the first time you write it, and just like any code design model design is an iterative process where one starts out somewhere, test it and then improves it over time. So importantly once one constructs a model, the first thing to do is to test the model. Ask it queries and see whether the answers coming out are reasonable. There's also a suite of tools to do what's called sensitivity analysis. Which means that one can do, for, one for, can look at a given query, and ask which parameters have the biggest different on the value of the query, and that means those are probably the ones that we should fine tune in order to get the best results to the queries that we hear about. Finally any iterative refinement process, usually depends extensively on a process of error analysis. Where once we have identified the erros that our model makes we go back and try and see which improvements to the model are going to make those errors go away. It could be for example adding features for example in some of the image segmentation work that we did there's features that might help eliminate certain errors that we see in our segmentation results. Or maybe adding dependencies to the model that can capture the kind of structure that's in it.