Similar problems called the coupon collector problem. So, this is the same setup and the question is how many people, this is in a much bigger group. How many people would you have to ask before you found every day of the year? Well, you have to ask at least 365 and if you are lucky and everybody had a different birthday you'd be fine. But generally you'd have to go much more. So that again corresponds to throwing balls into urns. And again randomly. And then the question is how many do you have to throw in before every urn has at least one ball in it. So this one here. Nine finally finds it's way in. So generally, it'd be much more than the number of urns. The question is, how long til each urn has at least one ball in it. This comes from, I don't know what the rage is lately, say baseball cards or magic cards or Charlie and the Chocolate Factory in the cards. How many of the different ones you have to collect before you get every possible coupon. That's the same problem, in different types and when you get one it's random how many you have to get before you get them all. And there's actually plenty of practical applications where we want to know the answer to this problem as well. I have a 20-sided die, this is, again, just another illustration. You keep rolling the die, and every time you get a new value, you check off the value. So how many times, you're going to have to roll the die, so because of the birthday problem, after not too many rolls, you get a repeat. So not new, and then after a while you start to get into plenty of repeats. But the question is how many times do you have to roll it untill you get all possible values? So in this case now we're still looking for just two more possibilities. Finally we get the 4, and then after 37 rolls we have all possibilities For a 20 sided die. So in general, if given M, how many times you have to throw to get all different M values. That's the coupon collector problem. Now, with classical probability, this problem is actually not too difficult. And so I'll give a classical analysis. Actually, the analytic combinatorics is more complicated than this. But it's still worth doing because the classical analysis just gives the average. And if you want to study variants of a problem, or other properties of the distribution, you're going to need the structure that we get from analytic combinatorics. But anyway, here's the classical analysis. So, first thing is the probability that you need more than j rolls to get the k+1st coupon is k/M to the j. That's the probability that the first j rolls are all from the same k values. Again, adding those up in the same way, the expected number of rolls to get the (k+1)st coupon is just solving that, which is 1/(1-k/M) which is the same as M/(M-k). So that's for the (k + 1)st coupon, that's the expected number of rolls to get the (k + 1)st coupon, and so the expected number of rolls to get all the coupons is just summing that because that's an expectation, and we can sum expectations. So sum of that, of M/M- K, K from 0 to M, that's just M times the harmonic numbers. Change M to M- k and you get the harmonic numbers, which is asymptotic M ln M. So the number, if you have M different values, then there are rolls you have to do to get all of them, is on average M ln M. So And again, there's lots of motivation for studying this in more detail. Although, because of those many extensions that we want to study. So I'll show you how this happens with analytic components. Although, really, the motivation for this is much more interesting when we look at variations later on. So start out the same way as we did for the birthday problem. Coupon collector sequence is an N word that no empty set. Or it's a string that uses all the letters in the alphabet. So how many coupon collector sequences are there? So, for example, for m equals 26, that's one. Uses all the letters of the alphabet. Well known one. Then, after m equals five. There's an example. So, no empty sets So, again, we set it up the same way, as we did before. With using exponential generating functions treated as a label set. So what is a coupon collector sequence? It's a, we have M different letters, so it's a sequence of M different letters And now our sets have to be non-empty, so there just has to be a set size greater than zero. And that immediately gives us a EGF equation that, either it's the C-1 to the m. So, extracting coefficients, getting the coefficient of n factorial comes the coefficient of z to the n and we get this complicated equation, for the counting sequence, for the number of coupon collector sequences. Out this is an so helpful but if you go ahead and extract the coefficient to Z to the end getting alternating some and so not necessarily so helpful. I mean it’s asymptotic to be It's like m to the n, and then a smaller number to the n so subatomically it's m to the n. But all that means is if you have a long string almost all, if you have a random string of n objects where n's bigger than m. Almost all of them are going to use all the letters if N is significantly bigger than M. So that's not helpful. What we need to do is exactly evaluate this. So the probability, the random M-word length N is a coupon collector sequence is that, what I just showed. And this is a little bit complicated to evaluate because of the alternating sum feature of it. So And again, the probability that the last one is greater than n is 1 minus that. So the average is the sum of those probabilities, and so that's not too difficult to simplify a little bit. It's still got this alternating characteristic. And while if you have studied Kanuth eventually you can get the solution MHM, but it really is still kind of a magic solution. So this is possible to get to the end, but this is not a recommended way to get to the end. For this problem. When we look later on at generalizations and asymptotic methods in the second part of the course, we'll kind of see why this happens. But for the elementary derivation this one doesn't work. Now there is a way to do it with OGFs that pretty much mirrors the classical derivation. So if we define WMk, it'd be the class of M words that have k different letters. With the last letter appearing only once. So in this case, there's three different letters, and the last one appears only once. So then you have an OGF for those, and then we can actually make a probability generating function just by evaluating that at Z over N. And so then if you differentiate that probability generated function evaluated C equals 1 you get the mean weight time. For k coupons. So that's pretty simple generating function on manipulations. And so, what we'll do on the next slide is use a combinatorial construction to get an equation for the generating function and then perform this operation to give us a solution. So here's the construction. So if we have m words with k different letters. The last letter appearing only once. So either it's a last letter is one of the letters that's already used or It's a new letter, and that's what leads to that construction. And then that construction immediately translates to this generating function equation, which I won't read out. And then evaluate at z over M to get the probability generating function. Differentiate and evaluate at 1, then we get a recurrence relation for the thing that we're looking at, the wait time for k coupons, and that leads us to the same sum that we got in the elementary derivation, which is going to lead to, again, the end result. Definitely the classical derivations this way to get the average, but the structure here can be used to solve more complicated problems. And our answer in practice is, if you know how many different baseball cards there are, multiplied by the log of that. Now that's how many you're going to have to look for to get every possible coupon. So for 20 sided die it's about 60 or on application Is maybe testing pages in memory, you do it by having a program randomly access the pages, maybe that you want to be sure that the test hits every page. Then you can figure out, you can take, if M is a million, then it's going to take you about a million times natural logarithm, which is about 14.5 million Steps to test every page. That's an example of an application in the real world. It's a well known phenomenon that has lots of applications. And that's the combinatorics of the coupon collector problem. There is a combinatorial concept called a surjection that does really need analytic combinatorics to study. So what we call a coupon collector sequence is, it's an M-word with no empty set. So that's called an M-surjection. But there's a more general thing which is just a word that is there is an M-surjection for some M and that appears in lots of applications. So that is either it uses just one letter or uses two letters, but without leaving any out. Where it uses three letters without leaving any out. And this is a combinatory logic that's a well defined and easily handled with the symbolic method. So this is what we did for M-surjections. It's a sequence of non-empty sets, e to the Z minus 1 to the M. And we hit for that. For surjections, it's not a sequence of size M, it's a sequence of any length. So that is 1 over 1- e to the z- 1 so set of, our empty set is this z-1. Sequence is 1 over -1 so it's 1 over 2 e to the z. And so how many subjection are there linked in. We'll see that this is best handled with complex asymptotics but it's N. Over 2(log 2) to the N+1. That's a classical example in combinatorics. Related to the coupon collector problem. And we'll see this coming up in more detailed studies in part two. So that's the coupon collector problem. And next we'll go onto applications in computer science.