So now is the first quiz. So what were the three measures that we did before the coffee break? Coefficient was a word somewhere. >> [LAUGH] >> Average distance. >> Average path length. Yeah. >> Diameter? >> Diameter, yeah. So diameter and average path length, that's about a spread of the network. The degree distribution and the clustering coefficient. So these are normally three metrics which can actually characterize the type of network that you've got quite well. Now, we'll talk about something called centrality measures. And I'll go through three of those with you. There is another one which is called eigenvector, Eigenvalue system centrality, which is related to Google's page rank. So does anyone know how Google ranks pages? >> [INAUDIBLE] >> Yeah, it's not just the links, that will be degree, it's something else. >> [INAUDIBLE] >> Yeah, somebody- >> [INAUDIBLE] >> So it's almost like that, but they have a ranking system. So it's weighted, the ranking of your Google page is increased if you have more connections coming in to you. To your page, but it also affected by the type of page that is coming in. So it's a reputation. So it's a kind of a circular reference that I am highly ranked if I'm connected to from other highly ranked web pages. This is basically eigenvalue centrality. So this is what they made millions doing. We won't cover that because the algorithm to calculate is pretty complex. But it's the same kind of thing, a node in the network when we're talking about centrality, by the way, we're always talking about a node, how central is that node to the network. And you can take the average of that as well, but how many? What is the centrality of the node, so identify which node essential which node is important you can do in different ways. One is you can say, the higher the degree, the more important it is. So degree centrality is just related to the degree. There are other ones which somebody mentioned before, which is between us, which I'll explain. There's another one called closeness and there's a another one called a the eigenvalue centrality. So the eigenvalue sensuality nodes which have eigen high eigenvalue centrality are also linked to from other nodes that have high eigen values and charity. So it's this. It's a circular reference. So okay. So these are the things that will we'll cover. The degree centrality is pretty straightforward. There's always a normalized version and a non normalized version. So the node, node I's degree, which is denoted by this is just the degree. That's it. The average degree of the network is just the average degree of all the nodes. In the network, so it's just the sum for all i from one to n, okay i this should be divided by n bar it should be total degree. Okay, ignore this one for a minute. I'll fix it. By the way, somebody asked me whether I can share the slides. I will do that. That's no problem. You don't have to take pictures. >> [LAUGH]. >> So now you filled all your phone memory up then I'll share the slides. Normally I think at the end of the winter school, you will actually get a demo if this is the case. So I have to be careful. But it used to be, that you got a folder either on dropbox, or on a pendrive. With the contents of the all the lectures, not just mine, but everyone's but I will give you mine for certain so you don't have to take pictures and I will fix some of the problems. And then you can normalise this by just the number of nodes, right? So this is in this case, the degree of this node is 4, 1 and so on. You've all seen this before, I guess you will all know it now, right? Yep, and you normalize that you just divided by A minus one, okay? So, if you're high if you have a high degree centrality, it means you have a lot of friends or you're highly connected in the network. And you can be fairly important for different things. If you're have a high degree centrality you have a high connectivity, then in any disease spread, you're going to be a super spreader. As soon as you get the disease, that's it, you're really in trouble. So, typically people like this in contact networks are, who would you think? If you take a contact network of people in the world and you think about how diseases get spread. Who are the people who have lots of contacts? So you have to think about two things. So one is the probability of having connected to someone who's got a disease. The other is just your number of friends, right? So people who work in shops or work in train stations, they are going to come into contact with a lot of people every day, especially if you do this in Singapore or another big city. There are other people who are maybe not as highly connected but the probability of them connecting someone with a high where the disease is higher. So nurses is a good one because there's a high chance that you'd connect with someone in the hospital. Also, anyone else? >> Taxi drivers. >> Taxi drivers in Singapore, yeah, this is- >> Teachers. >> Teachers and kids, yeah. And the kids are particularly good because they just like to say hello with their tongues. And so they can have a higher probability of transmission. So that I can tell you something about the network about the degree centrality. So somebody mentioned already between a centrality. So between us a heart a node has high between us if it's between a lot of other nodes. So what you do is you calculate the shortest path between every pair of nodes in the entire network, right? And then a node with high between us is going to be on a lot of those shortest paths. So that means if I'm sending a communication through the network, which I will do on the shortest path, the likelihood of it going past me is very, very high. So you can think of a network actually what I can do here let me just show you. [COUGH] So, these two things don't always correlate. So you can have a low degree but high between us. And this is very common in friendship networks. So- If you've managed to download your Facebook data which Facebook we used to do Facebook experiment, Facebook stops anyone downloading their Facebook network because they're evil. So that happened last year when I was trying to do it. I found out the day before we were starting the lectures. So in this case this node has quite low degree. But it's very, very high between us. Because if you calculate the shortest path between any node on this side and any node and that side, this will lie in between. So it's like a sniffer, right? It's somebody who can sniff a lot of a lot of communication across the network. So typically, if you look at your favorite friendship network, certainly for mine If you move around the world You got your high school friends, maybe you've got your friends from one city you got your friends to another city. And then occasionally you find one friend who has a connection between the two groups. So somebody that randomly met one of your other friends you find that they have very high between us. So these are the people that you immediately break friends with because rumours. So you can actually calculate this for the friends and I'll show you my Facebook network later. And unfortunately for me it's always either my mum, which is pretty bad, or even worse, it's my my wife. So, I know. So, the more formal definition. For this, sigma here, is the total number of shortest paths between nodes h and j, right? So- So if I calculate the shortest path from node A to now B then there are two options. I can go via this way or I can go via this way. So they shared between us equally, right? This is on one shortest path between A and B. This is on par for shortest path between A and B because there's another one there as well. So what you do is you calculate the number of shortest paths that I lies on between h and j. So let's make this h and j. All right, so if we're looking at, let's say node A and node B, the shortest number shortest path as a liaison between H and j is one. The total number of paths between shortest paths between H and j is 1, 2 right? So the between us count that this node gets for the shortest path is a half. Yeah? And then I do that not just for H and J, but for every set of every pair of nodes in the entire network. So if I have another node here, I call this K. I calculate all the shortest paths between K and J. How many does A lie on? How many does B lie on? And so on, and I calculate that for every node. It's very expensive. Because you have to do with every pairwise, right? So for every pair of nodes you have to calculate this. [COUGH] Okay, so let's look at an example. So here we have a pretty simple network A, B, C, D, E. You first have to calculate all the shortest paths. So for every pair of nodes, so if I go from A to B, the shortest path is A, B, A to C, it's ABC, A to D, it's ABCD and E is ABCDE. And so on and so on and so on. Know that I don't have to do a to b again. So here, I don't do B to A why? It's an undirected network. If it was a direct network, this will be different because then there's, in fact, if I had just an arrow going from A to B there's no path from BTA, I'd have to make a circle background somehow. So, I only have to calculate this one way in this case. Okay so, then I calculate I had to sum up for every node how many shortest paths it lies right? So this is saying between all nodes h and j how many does I lie on. So we try this a, we try this for B we try this for C who tries to D Okay, so how many shortest paths does a lie on? You've also noticed that it says h not equal to i not equal to j. So you ignore the endpoints, you're only looking at the ones in between. So how many shortest paths does a lie on? I've given you a clue somehow. None, zero, acre land a shortest path because it's an endpoint of the network. So it will never be in between anything. Right, what about B? 3432 lies between A and C, A and D and A and E. So sorry I missed it right? It was just a test To see my arm, 4, yeah? >> [INAUDIBLE] >> Yes it should see to what now there is a problem. The problem is the following. I'll show you it's correct, but this should be like that. Yep. I don't know why every time I come back it goes yellow. But that is the rule just said that this one to be consistent that should have started D on this line, right? So D is another start point. So this is the first line is starting from A, the second line starting from B, third line starting C, fourth line starting from D. I never have to start from E because I don't have to go in reverse. So all the ones in red are the ones that lie in between two endpoints. So you just count the number of red A's and the red B's number of the red C's, and you come up with the following. [COUGH] So that means the most central node in terms of betweenness is node C, which was the middle one. Which makes sense, right? It's in the middle of the network or the paths. It lies on all the shortest paths basically, between any nodes. So any kind of communication, except between A and B and C and D, it will lie on. That's the non normalized version, you normalize this by the following. So you take the non normalized count should CBI u divided by N minus 1 times N minus 2 divided by 2. What does that mean? You should recognize this by now. It's the same as N times N minus 1 divided by 2, except you remove yourself from the network. So it's the total number of possible links when I take myself out the network So that's between us. And the final one is called closeness. Okay, so let me just check something. If we have hired between us, I think I've asked this already what it means Is that you will be likely to intercept any message that goes on the network. Any message that's being communicated you have a high chance of seeing. Closeness. So, you have a question? You can explain it's a good very good point. Very good observation. So if a node has high between us, what it means is if you remove that node, you're very likely to separate the network in pieces, right? So this node has the highest between us and here. If I remove this I separate the network in two. So if I do want to break the network apart, what I can do is target the nodes with high between us. And that will be one strategy to break apart. It also depends slightly on how close to the network is. So if you find these communities a lot in the network, you're likely to find these high between these nodes between the communities and then you can attack them. I think Erica, actually on Thursday, you will mention a bit more about this, right? So close and sensuality what does it mean? As you guess, without looking at my intuition should have removed that. Was basically saying how far is that node from the rest of the network. So I calculate from my starting position, I calculate the path length to all the other nodes and I average that. If that's small, I'm close to the rest of the network. So this is just saying the ij is the distance from me, which is I to all of the nodes j, I sum all that up, and I do that one over that because I have a higher closeness if my distance is smaller, so it's just the universe. [COUGH] So another example. So you have to remember what A, B and C and D are. So it's in clockwise so A,B,C,D,E,F. So those are the distance from A to B to C to D to E. So the distance from A to B is one, difference from A to C is two, remember A,B,C,D,E,F. Then the distance from A to D is also 2. The distance from A to E is also 2 .The distance from A to F is two. Yeah, sorry 3, 123 yep. Okay, it's always the shortest path that you have to calculate. You can do this with the same thing. There's always the inverse of that, right? So these are the actual numbers that you see. So the high node with the highest closeness is node B, which also makes sense, right? It's close [INAUDIBLE]. Okay, and then there's a normalized version where you take the original one divide that by n-1.