Okay, welcome back. This is Module F of our unit on schizophrenia. And in this unit, I'm going to introduce an alternative way of identifying genetic variants for behavioral phenotypes, actually for non-behavioral phenotypes as you'll see as well. And if you recall where we left off with last time, with the Sanders et al study. The Sanders et al study is an example of what geneticists call a candidate-gene study. And that is where we had some hypothesis or they had some hypothesis about the genes they thought were relevant for schizophrenia and then they looked at genetic variants in those genes. And tried to associate those in a case control study, a comparison of individuals with schizophrenia and controls to see whether or not those genetic variants differed in the frequency across those two groups. And what you saw last time is that, that failure that study really came up with no positive findings. In fact, the candidate-gene approach throughout human genetics was pursued for about ten or 15 years, and over those ten or 15 years, not a lot came out of the candidate-gene approach. And this is something that as we go through the remainder of the course, it'll come back a couple times, and I'll talk a little bit about maybe why that was the case. But, the, one last anecdote before we, we kind of get into the thick of genome wide association studies. The Sanders paper was published in two 2008. And I teach a course here at the, at the University of Minnesota in behavioral genetics. And when I taught the course in 2008, or 2009, or even maybe in 2010, and I talked about schizophrenia in that course, I had to end the course on the Sanders study. Which is really a very disappointing result, right, they didn't find anything. And but none the less that was the state of the field. Fortunately, after the Sanders et al study was published, another approach came along. And in my opinion, this approach has made remarkable progress in helping us understand the genetics of schizophrenia. And that approach is called the Genome, a Genome-Wide Association Study. And in a candidate-gene study, we're targeting a certain region of the genome. In a Genome-Wide Association Study, we're studying the whole genome. We're not picking or hedging our bets ahead of time, saying this is the gene that we think is important. We're going to look everywhere. And before talking about schizophrenia, let me try to introduce the notion of Genome-Wide Association Study with another phenotype, height. We know a lot about the genetics of height. Twin studies, and we've actually looked at some of this data earlier in this course, twin studies show that height is highly heritable and nobody is going to debate those twin studies. Height appears to be about 80 to 90% heritable from those twin studies. We know some of the genes that affect how tall you are, but very few. These are the genes that affect you either being very short or very tall, things associated with achondroplasia, which is a form of dwarfism, or being very, very tall, something like Marfan syndrome. Something that, that geneticists think, at least some geneticists think Abraham Lincoln had. So those have been identified, but those are pretty rare. And those don't really help us understand, what are the genetic factors that contribute to individual differences in height, throughout the normal range of height? Why are you a little bit taller or a little bit shorter than I am? What are those genetic factors? Here's a quote from a very prominent geneticist, actually a Fin, Markus Perola, from 2007. It turned out geneticists have expended a lot of effort to try to find these genetic factors that contribute to individual differences in height throughout the normal range of height. Not just those extreme, versions of height. And what he's saying here in this quote, and I won't read the quote, quantitative trait loci are just genetic factors. But what he's saying is, we've looked really hard. We can't find anything and I'm disappointed in that. Here's a quote from another review article on the genetics of height one year later. In the past 18 months, we now have begun to find the genetic factors contributing to individual differences in height. What happened between 2007 and 2008 between these two quotes? What happened was the introduction of this new approach, Genome-Wide Association Study, or gene, GWAS. And GWAS was really enabled by the development of a technology in genetics. A technology, that I confess, I don't fully understand because I'm not a chemist. But it turns out that scientists have been able to, to develop arrays that allow them to rapidly and very cheaply genotype a large number of genetic loci, genetic factors. The genetic factors are SNPs, we call SNPs or single nucleotide polymorphisms. In this particular array here that's manufactured by Illumina, there are other companies that manufacture them, but this particular array actually has the facility or the capability of genotyping 1 million SNPs. And the way they're being genotyped here is we, we've called it SNPs although there, there, there are differences in the DNA sequence at the base level. And although there are four bases, SNPs are predominantly one base or, or another, so they're biallelic. And so what's happening here is they're genotyping. Green is the homozygote for one allele. Red is the homozygote for another allele, and yellow is the combination of green and red, and so you get yellow and that's the heterozygote. So what they do is they put DNA on this array and then they scan the colors and they, and they can quickly determine 1 million SNPs throughout the genome. Now we've talked about SNPs before and there are more than a millions SNPs in the genome. What the, in building these arrays what they tried to do is place a SNP equidistant across the 3 billion bases of DNA in the, in the human genome. If you have 1 million SNPs, then if you're placing them about ever, equally and you can't do that exactly, of course. Then you're placing a SNP every 3,000 bases of DNA, but there are going to be SNPs in between those 3,000 bases. But they don't have to worry about those for a reason that we talked about last time. That because SNPs that are close to one another on a chromosome tend to be highly correlated, you don't need to genotype every SNP. So if you genotype a million, you can actually predict the ones that you didn't genotype. That's something that we talked about before that's called tagging or imputation. What a GWAS involves is something like genotyping a thousand, or, I'm sorry, a million snips, taking cases and controls, which we will see here in a little bit, and just determining whether or not the frequency of the SNP varies or differs in the case group versus the controls. And you do that, if you genotyped a million of these, you do it 1 million times. What has this done for the genetics of height? Here's a summary table for studies of GWAS studies of heights. These are large-scale studies, look at the sample sizes there. The initial studies only genotype 300,000 SNPs, but now they're up in the, the multiple millions. Over time, 2008, 2010, 2014, more and more SNP associations have been identified, and last time,. That I, I know of them doing this. They've actually identified over 700 individual SNPs associated with height in a sample of 250,000. So yes, they made a, they're beginning to make a lot of progress in understanding the genetic factors that contribute to individuals' difference in height within the normal range. And we've learned some lessons from these early GWAS studies. First things is that right, the last time that they did the GWAS, or the, the last entry in that table, is they identified 700 variants. But if you look at each one of those variants, they have a very, very small effect on your height. The effect is much less than 0.3% of the variance in height. That means you need those very large samples to detect the variance. That's why they needed 250,000. And secondly, there must be many, many variants to account for individual differences in height. Hundreds, in fact, really probably multiple thousands of variants to account for the heritability of height. The second thing we've learned is that in that last sample that I gave you, there were 250,000 individuals participating in that study. They identified about 700 variants. How much of the variance in height could be accounted for by those 700 SNPs? It turns out that 16% of the variance in height could be accounted for, but height is 80 or 90% heritable. We did 250,000. We identified 70 genetic SNPs that are associated with height. We account for 16% of the variance, but it's 80% heritable. Much of the heritability remains missing. So what happens is we've done these studies in height or as geneticists have done these studies in height. One, we see they're successful. They can identify things, but what they're identifying have a small effect on phenotype and even in aggregate, much of the heritability remains missing. So now let's go back to schizophrenia, and this is the most recent GWAS of schizophrenia. Look how much bigger this sample is than the Sanders stam, sample. Now 38,000 cases of people with schizophrenia, over 100,000 controls. No single individual can contrue, can put together a sample that large. They had to have 52 different studies, so this is a meta-analysis. They, the sample was overwhelmingly European. If you recall, the Sanders study was all Europeans, and I come back to that and talk about that a little bit later. Just, I'll make mention of it now. They had several million SNPs on each individual in this study and the analysis is very simple. If you've ever taken a statistics class, you've done these types of analysis. You just haven't done them on the scale of a GWAS. The frequency of the variant in the cases versus the frequency of the variant in the controls. You test whether or not that frequency is significantly different, which is a chi-square test, and you do that however many millions of times that you have genetic variants, the SNPs. So you've probably done something like that if you've taken statistics. What's kind of new here is the scale of the problem. You're doing it many millions of times, and that introduces some issues that we need to consider before we interpret the results from this GWAS of schizophrenia. There's what's called a multiple testing problem and we've already run into this when we talked about the Sanders study. When you, if you recall from your statistics, when you test a hypothesis you test it at a p-value of 0.05, which means you have just a 5% chance of saying that there's a, a meaningful difference here when there's no difference. But that's true when you do one test. In the Sanders study, they did 600 tests. So if they test each one at 5%, then by chance, they would expect 5% of 600 to be significant, even when the results are pure chance, there's nothing there. And we saw that, that's pretty much what they found. They were observing affects at the chance level. If you do things a million times at 5%, then by chance you're going to get 50,000 significant results. So the first thing geneticists had to do is not test at a p-value of 0.05. They had to test at a much smaller p-value. And the p-value that's used in GWAS, and this is true across the broad in various GWAS, is a p-value of 5 times 10 to the minus 8. That's seven zeros, five, 0.00000005. That's the control for this, what's called the multiple testing problem. The second thing is, well we're going to do this let's say at least a million times, maybe 2 million times. It's kind of hard to look at the results. So how are we going to display the results? Well two things that geneticists do when they display the results, first of all it's hard to count those zeros, right? Are there seven zeros there or eight zeros? So rather than looking at a whole bunch of zeros, it's easier to transform on to a logarithm scale. So rather than plot, looking a p-values, what geneticists look at is the logarithm to the base 10 and then they take a minus of that and they look at those numbers. So because they take the minus of the log, the higher the number, the lower the p-value. So, a log to the base 10 of 0.001 is minus three, and if you take the minus of that is three. And the reason for that is that 0.001 is ten to the minus third power. If this is the critical value then on this minus log scale, the critical value is minus log five times ten to the minus eight, 7.3. So rather than look at p-values they look at this minus log scale. It's a little easier to look at. The second thing is, you still have to look at all these p-values, but they do it graphically, and they do it graphically in what's called a Manhattan plot. And this is actually the first Manhattan plot for a large scale GWAS of schizophrenia. In this case, this is not the study that I, I just introduced. This is one with 3,000 individuals with schizophrenia and about 3,500 controls. What's plotted here are about a million p-values on this minus log p scale. In order to be significant, it has to, this, on this scale it has to be greater than 7.3. Are any greater than 7.3? No. In fact, in this large scale study, it was kind of like the Sanders study. They didn't find anything significant at all, and they actually came under a lot of criticism. You did all this work, you genotyped a million SNPs, you had 3,000 people with schizophrenia, and you didn't find anything. Why not give up? Fortunately, they didn't give up. We're still not to this study yet. This is the second study. Now they're up to 21,000 people with schizophrenia and 38,000 controls. And now again, they're plotting minus log p, and here's the critical value, 7.3. Now they're finding things. In fact, 22 things were above the significance level. And here's why, this is a nice illustration of why you call this a Manhattan plot. This plot doesn't look like the skyline of Manhattan. It kind of looks like the skyline of the city I'm living in, Minneapolis, no big skyscrapers there. But look how this is, we're beginning to see skyscrapers here. It's beginning to look like Manhattan. That's what you want to see. You want to see something that looks like Manhattan, and what you're seeing here again, each, you've got over a million points being plotted here so they look like a blur. But anytime they go above the line, that's significant. So on chromosome six here, you can see there's a highly significant result. It's well above the 7.3 line. And you can see actually there's a lot of. Different p-values above that 7.3. The reason, there's actually only one real effect here. The other ones are being pulled along by this one because of tagging. Because if they're close to that one, physically on the chromosome, they will be correlated with it. So their p-value will also look to be significant, but there's actually only one effect there. Now let's look at the most recent GWAS of schizophrenia. Now 38 thousand individuals with schizophrenia over 100 thousand controls. Here's the Manhattan plot. We've gone from 22 significant results to 108 significant results. Now we're getting a lot of results. We're getting. It's the same thing with height. And the skyline maybe even looks better than, than the one from Manhattan. Maybe it looks a little bit like what is it, Dubai or something like this. This is a really big skyscraper. That again is that chromosome six result there. But you see we're, we're really finding a lot of things. But again a familiar pattern emerges. The, and, and this is pretty true of all these GWAS's of various disorders, psychiatric or otherwise. First of all the, if, the, the variants that are being identified tend to be relatively common. They tend to have frequencies of greater than 20% even in the control group. Which means that many of us are carrying genetic risk factors for schizophrenia. We just don't carry enough of them to develop the disorder if we don't have the disorder. They have very smell effects just like those variants for height. If they have small effects, some have argued, why bother looking for them? Well, one of the arguments for looking for them is they begin to identify the likely biological pathways here. In the, in the case of this most recent GWAS of schizophrenia, things involved in immune disregulation, that's the chromosome 6 results. That's, chromosome 6 is where the major histocompatibility losi reside, the things that are involved in immune response. Synaptic plasticity are all being implicated by this particular pattern of results. Secondly, when taken in aggregate just like when we take the height results in aggregate, most of the variants, the heritability of schizophrenia remains unexplained. It's missing. In aggregate all of these 108 hits in this last GWAS, which is massive in scale, could only account for 5-7% of the variance in schizophrenia. If it's 60, 50, 60, 70% heritable that means most of the heritability is missing. Missing heritability is not a problem unique to psychiatric disorders, unique to schizophrenia. Here I've listed non-psychiatric traits, body mass index or obesity, height, and some other physical traits, large scale GWAS. This is how much in aggregate those GWASs can account for in the variants they identify how much of the variants in the trait they can account for. This is how much heritability is implicated by twin studies. And as you can see, very consistently, just like schizophrenia, just like really any phenotype at this point, we do these large scale studies, we identify variants. They have small effects and an aggregate, they account for some of the heritability but most of the heritability remains missing. That is the missing heritability problem. Where is the missing heritability? This is the last slide here. Well it could be that. We need to get even larger and larger samples to get smaller and smaller SNP effects, right. We can see the Manhattan plots growing overtime. So, if we had 500,000 people in the study of height, or 100,000 people with schizophrenia, we're going to find many more variants. And you can ask, and I think it's a reasonable to ask, and we'll explore it later in this course, is it worth doing that? I'm of a mind that it is, but there are very credible scientists that think it's not worth doing that. The second thing is well, maybe the missing heritability, maybe GWAS is really missing something. And one thing about GWAS is its only looking at terp, certain types of genetic variance, SNPs. And remember if you go back to, I think it's unit three or unit four in this course we talked about different types of genetic variance. SNPs are just one class of genetic differences among us. There are other types of genetic factors that we differ on. Maybe if we're looking at SNPs we're missing other ones that are important. I'm going to talk about that in the next module in this unit. And maybe there are other reasons. Maybe gene environment interaction. Maybe the heritability estimates are wrong. We don't know. This is a real major question now in human genetics. Where is that missing heritability? Next time we'll talk about at least one hypothesis about where are some of that missing heritability might reside. Thank you.