Welcome back. So, this is now the fourth module in the fourth week of the course. Last time we talked about the Human Genome Project, and I told you that 99.9% of the DNA we have is the same, 0.1% is different. But really a course in behavioral genetics is a course in how that 0.1% of our genomes might affect our behavior. So, the point of this module is to talk about well, what is that 0.1% of the human genome where we might differ among among each other? What does it look like? So, recall from last time, we might expect on average about 6 million base differences between any two random individuals. This time we're going to talk about the nature of those differences. Again, I'm sorry, a little terminology here. But if, if you're feeling overwhelmed by the terminology, please go to the, the course website, and we have various aids there to, to help you. A little terminology that geneticists use in, in, in talking about genetic differences. One term I've already introduced, and that's the, the, the notion of allele. Genes have specific locations in the genome. That's called the gene locus. Alternative forms for a gene at a location are called an allele. So, we talked about the ABO locus, which is on chromosome 9, and then the alleles for ABO are A, B and O. A polymorphism is a genetic difference that is common in the, in the population. The allele frequency is greater than 1%. Mutations, usually geneticists reserve use of the term mutation for genetic differences that are rare, for alleles that have frequencies less than 1%. Polymorphism common mutation rare. That actual, that distinction between alleles that are common versus alleles that are rare, will actually I'll, I'll come back to that. It'll actually be a very important distinction when we talk about genes that might affect your risk for schizophrenia. Polymorphisms are common alleles. Mutations are rare alleles. And then, the, the term genetic variant is just kind of a generic term. It's meant to encompass both polymorphisms and mutations. So, how might we differ? The way I like to think of it is that we differ in, in three different, in three ways our genomes might differ. And I know this is an extremely busy slide, but we're going to walk through each one of these ways. We might, we will differ in the actual bases of DNA we have at a particular location in the genome. I'm going to call that a sequence difference. Secondly, we actually, remarkably, I think, we differ in how much DNA each of us have. We all don't have the same number of bases of DNA. We differ in how many bases of DNA we have. I'm going to call those structural genomic differences. Sequence differences, the actual basis in a particular location. Structural differences, how much DNA each of us have. And then finally, and I'm, I'm going to talk least about this, the way that, that, the, the DNA is packaged or organized might actually be might vary among us. The location of certain things in our genome might differ between you and I. Those I'll call organizational differences. So, let's go through each one of these, in turn, and try to illustrate them. We'll begin with sequence differences. So, sequence differences are you go to a specific location in the genome. And, of course, the Human Genome Project actually identified all those specific locations. And when you go to that location, people differ in the base they have of that location. This is illustrated here. So, here's a, a sequence of DNA for, I guess, three different six different individuals. And many of those bases, like most of the bases in our genomes, are not different across the six individuals. All of them have A here. All of them have G here. And so on and so forth. But, at this fifth base here, some people have a G and some people have a C at that base. That's a genetic variant. And then the, we go along and then there's another one here where some have an A and some have a C. Those variants here are called single nucleotide polymorphisms, or the pronunciation is that they are SNPs. They're regions of the genome where we differ in the base we have. Or people differ in the population at the base they have, the nucleotide base they have, at that particular location. Those are called SNPs. The regions that aren't varying across individuals, right, they're, they don't vary, they're not SNPs, are sometimes called monomorphic. That is, they're not polymorphic. They don't vary. They're constant across the genome. Turns out that SNPs are an extraordinarily important source of genetic variation, that 0.1% of our genomes. Numerically, they're the most common type of genetic variant. There have been cataloged now, one of the, the big efforts in the kind of the middle stages of the Human Genome Project was to begin to catalog genetic differences among us, because those would be the, important for genetic medi medicine. And so they began to catalog these SNPs. And the catalog now consists of over 10 million different SNPs. And the catalog is, is something called dbSNP, or database SNP. It's, it's publicly available. And what the catalog includes is actually where in the genome that SNP exists. What the alleles look like. And every SNP, you're not going to name all these like we can name the ABO locus, so they're numbered. And they're given a number, and the number is actually begins with what's called rs. So, it's an rs number, rs standing for reference SNP. So, all 10 million of these have rs and then a number after them that uniquely identifies that particular location in the genome where we can differ in the base of DNA we have there. The second thing about SNPs. So, they're very, very common. They're cataloged. The second thing about SNPs are that they're four bases of DNA, right? GCTA. So, in principle, SNPs could have four bases there. In practice, overwhelmingly, SNPs have two bases. Most people, overwhelmingly in the population, have one or another nucleotide base at these SNIP locations. It turns out that that makes SNPs very easy to genotype because they're binary systems. And because of that, we've been able to build, or geneticists have been able to build, an array technology to efficiently and, and cost effectively genotype large numbers of SNPs. Something that we will definitely come back to. The final thing I want to point out is that, right, the SNPs are distributed throughout the genome. They're, they're 10 million, and given that we have three gigabases of DNA in our in our genomes, so there's roughly about a, a SNP every 300 bases of DNA. So, sometimes they're going to be in the coding region of the gene. Sometimes they'll be in the intron of the gene. Sometimes they'll be in the regulatory region of the gene. Sometimes they'll be in regions of the genome that might not have any effect. So, understanding whether or not they have an effect, they might not have an effect, even though they differ among us, will be an important thing in an analysis of if we find a SNP to be associated with a behavioral phenotype. And to go back to that beta-hemoglobin gene because the mutation that actually results in the sickle cell anemia phenotype is actually a SNP. It's a SNP that there's a mutation from the DNA base T to A. So, people with the sickle cell mutation have an A base where everybody else has a T base. And that actually is in the coding region of the gene. So, it actually ends up changing the amino acid. It changes it from glutamic acid to valine at this particular location. And that's enough to change the beta-hemoglobin molecule to to produce the sickle cell trait. In the Human Genome Project, so that's a SNP. In the Human Genome Project, right, we've located where this gene is. In the case of the hem, hemoglobin gene is located on the short arm, the 11 the p arm of chromosome 11. It's right up here. And this SNP then is located, we know exactly where it's located in the genome. There are about 135 million or megabases of DNA on chromosome 11, and we count chromosome 11 from the top here, from the short arm, to the bottom. And it turns out the SNP that's associated with sickle cell anemia is five, a little bit more than five million bases down from the top. So, that's it's actual, precise location. It's given the number rs, reference SNP number 334. Again, these are numbered, they're actually numbered, more or less, sequentially when they were discovered. So, this was one of the first discovered SNPs. It's a very important SNP. That's why it was discovered very early, right. The snips now number up to 10 million. So, rs334 is a SNP. It's associated with a coding change in the eh, hemoglobin the beta-hemoglobin gene. And it's what's underlies sickle cell anemia. We know its location. We know the location of the gene. That's the first type of genetic variation among us. The second type is that actually we inherit different amounts of DNA. We don't all have exactly the same number of bases of DNA. This is what I'll call structural variation. And there are different types of structural variation. The first time, the first type I'm going to talk about is what's called a variable number of tandem repeats, VNTR. There's regions of our genome, or stretches of DNA, that exist in all our genomes that are duplicated throughout our genomes. Sometimes it's duplicated in different locations of the genome. Other times it's duplicated in tandem. That is, one right after the other. A variable number of tandem repeat is where there's a sequence of DNA that's repeated as a variable number of times in tandem one right after the other. Here is an example of a variable number of tandem repeat, a disorder that we'll actually talk about in the last week of the class, Huntington's disease. Most of you have probably heard about it. It's a very devastating neurological disorder. And in the coding region of this gene, as you go along, this is the DNA, this is the sense strand of DNA now. Five prime, three prime, end, upstream, downstream. So, we're reading this way. That's the orientation of the DNA. And I'm just directly translating what the prot what the amino acid would be here. All of a sudden we hit a stretch of bases of DNA in this, the gene here, that are CAG CAG CAG CAG CAG CAG. That is a tandem repeat. In this case, there are three bases of DNA that are repeated in tandem, that is one after the other. Other, other places of our genome there might be two bases of DNA repeated in tandem. Or there could be 40 bes, bases of DNA repeated in tandem. There are different lengths of these. This one happens to be three bases long. In the case of this particular mutation, people differ in the, in, in how many repeats you have in this particular location. In a SNP we have two alleles, overwhelmingly, one base or the other. With VNTRs, or variable number of tandem repeats, people differ in the number of repeats they have. In the case of the Huntington disease mutation, those of us who have fewer than 35 repeats in this area do not have the disorder. People have 40 or more of these repeats will develop Huntington's disease, if they live long enough, with a 100% certainty. And there's this kind of a gray area in between where 35 to 39 repeats, you may or may not develop the disease. So, there, a second type of polymorphism that exists in the genome are regions of the genome where there's DNA sequence that's repeated over and over. And we differ, not in necessarily what that sequence is, we all have CAGs, but how many CAGs we have. Those are called variable number of tandem repeats. The second major type of structural DNA, and I, and I apologize. This is such a rapidly developing area of human genetics, that the terminology is not really that crisp, and I, I, I recognize it can get a little confusing. But that's because it's, these are things that are being discovered literally almost every year, so the terminology hasn't quite caught up. So, the second type I'm going to talk about is that you have extra or missing DNA. When we all took, well, when I took human biology in, in, in in high school, and whenever you last had human biology, we had this notion that we have two copies of, of DNA in each part of our genome. One copy we inherit from our father, and one copy we inherit from our mother. In fact, that's not generally true. There are regions of our genome where sometimes we have one copy, sometimes we have three copies, sometimes we even might have four copies or zero copies. What do those regions look like, or what do we call those regions? If the number of bases that are duplicated or inserted is a small number of bases, they're called insertions or deletions. Here's a little example of this. So, one person might have this string of DNA, where they have TG here, but the next person, when the DNA is aligned, are missing these two bases. That would be called an insertion versus a deletion. These are small number of DNA bases. Usually people think of them as like fewer than a hundred bases, but they might be more than a hundred bases. But, it, it could be as, as few as one base of DNA, where some people have a base there, other people don't. Insertion deletion. On a somewhat larger scale, actually, quite a bit larger scale, are what are called copy number variants. And this is actually truly remarkable. If we go through your genome, you go through my genome, all of a sudden you'll hit a stretch of DNA. It could last for 100,000 bases of DNA, or even maybe a million bases of DNA, where I might be missing one whole strand of DNA. I might not have the strand that came from my father, the strand that came my mother. You might have three copies of that DNA in that location. Copy number variance, or CNVs, are large segments of DNA. The designation right now is at least 1000 bases long, but they go up to megabases, millions of bases of DNA, where we're missing parts of the DNA, or we have duplicated DNA in that region. Later, when we talk about schizophrenia, we're going to talk about a, a disorder that's associated with schizophrenia, where people are missing about three million bases of DNA on chromosome 22. They're missing this segment here. And if you're missing that segment here, you have a highly increased risk of developing schizophrenia. That's a CNV. Some people will miss those, those bases and have an increased risk of developing schizophrenia. Other pe, regions of your genome, remarkably you can miss, be missing a million bases and, for all intents and purposes, it doesn't seem to have any effect on your phenotype. Copy number variants. Finally, if we scale this up to the, the largest level, aneuploidy is where you have an extra or missing chromosome. And we're really not going to talk about aneuploidy in this course. It's certainly important. The classic example of aneuploidy is having an extra chromosome 21, Down syndrome. So, here we see three copies of 21. There are not that many examples of aneuploidy in the human in, in humans, certainly chroma having three copies of chromosome 21 is a classic example. And you can have any aneuploidy for the sex chromosomes. But in general these are fairly rare, and they're, they're not going to be that related to the phenotypes we're going to talk about. But it, it's, it's consistent with this notion that our, our genomes can differ in terms of how much DNA we have. This is at the, the, the largest level. The last way we can differ, and here I'm going to say the least about this one, is just in, in how our DNA is organized. And the reason for this is there's not a lot of behavioral examples of organizational effects on behavior at this point. That, that we won't be talking much about this in the course. [SOUND] So, here's a couple pictures of organizational variations. So, sometimes our DNA can actually be inverted. That is, some people might have this orientation of their DNA, but in another person it could be flipped around. That's called an inversion. Many times that will have no consequences. You're just flipping, you have the same DNA, but it's just been rearranged. In some cases it might have an impact on your phenotype, especially if what, when you're flipping it around, you're, you're doing so right in the middle of a gene, you break up a gene. Another example of an organizational effect is sometimes genetic material can be translocated from one chromosome, this chromosome A to chromosome B. So, there's been exchange of material between A and B here. So, part of A now is on B, and part of B is, is on A. You have the same package of genetic material, it's just been rearranged. This may or may not have an effect. It really will depend upon whether or not you've broken up something when you've rearranged the material in this way. And again, we, we really, won't really have an opportunity to talk about these types of things. There's not a lot of examples of this in the behavioral domain, anyway. So, next time, actually, we're going to talk about a disorder that is associated with a deletion of genetic material, a disorder called Williams syndrome. And before I talk about it next time, what I'd like you to do is watch, it's like a five minute video about Williams syndrome. It's an extremely Interesting syndrome, a behavioral syndrome. And I've given you two links, and these are on the course web page as well, two links. They're the same video. One is on the ABC news page, and the other is a YouTube. It's the same video. Please watch those before, we we talk about Williams Syndrome next time.