In this video, we introduce the anomaly detection problem. As we know that we are collecting lots and lots of data, so essentially we are drowning in a deluge of data while at the same time we are starving for knowledge. We're not able to extract knowledge from it. Anomalous events are those which occur relatively infrequently. If you have lots of data, there would be very few anomalous events. However, their consequences can be quite dramatic, and usually negative. Finding anomalous event is like mining needle in a haystack where we have so much hay and so little time. What are anomalies or outliers? A set of data points that are considerably different than the remainder of the data. For example, in the figure on the right, we can see there are four groups or four clusters of data. However, there are few data points which does not belong to any of them and are far away from all of them. These are outliers, or exceptions, or peculiarities, or surprises. In real life, examples include cyber intrusion or credit card fraud. What are the natural implications of anomalies? As we know, anomalies by definition are relatively rare. One in 1,000 can occur often if you have lots of data. Also, the context is very important. For example, freezing temperatures in July. Although freezing temperatures close to 0 degrees Celsius are not uncommon, but having them in July is. They can be important or they can be nuisance. For example, a 10 foot tall two-year-old seems most likely as a nuisance and looks like there has been some error in the data. However, if we record some unusually high blood pressure, this could be important. Let's look at an example. Here in the figure shown on the right, we have N_1 and N_2 are the regions of normal behavior. It consists of lots of data points. Points O_1 and O_2, which occur separately and are far from the clusters N_1 and N_2 are anomalies. In the region O_3, although there are few points together, but they're still far from N_1 and N_2 and have relatively fewer points than them. They can also be considered as anomalous. Anomaly detection has some related problems such as rare class mining, chance discovery, novelty detection, exception mining, or black swan events. To show the importance of the anomaly detection problem, in 1985, researchers were puzzled by the data gathered by the British Antarctic Survey, which shows that the ozone levels has dropped by 10 percent. Now this was unexpected because the Nimbus 7 satellite, which has instrument to record ozone levels did not mention anything like that. Why did it not record any low ozone concentration? Because the readings were so low that they were being treated as outliers by the computer program and discarded. As seen in the figure in 1985, there is a sharp decrease in the ozone concentration. Now let's look at what are the causes of anomalies. One of the reason could we have data from different classes. For example, we are measuring weights of oranges, but by mistake, few grapes mixed in. Or we could have natural variation, such as unusually tall or short people, which can be found. Or we can have some data error. For example, the weight of a two-year-old baby is recorded as 200 pounds by mistake. What are the key challenges while finding anomalies? The first one being the defining a representative normal region is challenging. Because the boundary between normal and outline behavior is often not precise and the exact notion of an outlier is different for different application, hence, we need to be very careful. Availability of labeled data for training or validation is another challenge. We can also have some malicious adversaries which add wrong data. Data might contain some noise. Also, the normal behavior keeps evolving, so the definition of anomaly keeps on changing. Now look at some differences between anomalies and noise. Noise is erroneous, perhaps random, which are values or contaminating object. For example, the weight can be recorded incorrectly or while measuring weight, the grapefruits were mixed in the oranges. Noise doesn't necessarily produce unusual values or object because noisy data is usually slightly offset from the true value. Moreover, noise is not interesting. However, anomalies maybe interesting, which are generated as a result of related but distinct concepts. Now the detection of anomalies depend on various aspects of the data such as what is the nature of the input data? Is it binary, categorical, continuous, or hybrid? What is the relationship among data instances? Are the sequential or related with time? Are they spatial? Or are these spatiotemporal data? Or is this a graph data? Availability of supervision is another reason. In supervised learning, we have labels which are available for both normal data as well as anomalies. In semi-supervised setting, labels are available only for normal data and any data point which does not fit in this normal region will be labeled as anomalous. In unsupervised setting, no labeled data is available. Several types of anomalies can be found in our data. The first one being a point anomaly, in which an individual data instance is anomalous with respect to the data. We can also have collective anomalies in which a collection of related data instances is anomalous. Here in the figure on the right, we can see that the value of one is not abnormal and it's happening quite frequent. But in most cases, the reason goes from 0-1, then come back to zero. Only in the middle part, we have continuous region of one. That's why it's an anomaly. Contextual anomalies. These are the anomalies within a context and requires a notion of contexts to define them. In this, the notion can be relationship among instances such as sequential happening one after another, or spatial, or in the graph. Individual instances within a collective anomalies are not anomalous by themselves. For example, in this figure, in the middle part, there is a notch here. The value of whatever that notch represent is not anomalous. There are several points who have that value, but having that value at the location is anomalous. Now, let's look at what would be the output of an anomaly detection algorithm. Some anomaly detection algorithm generates a binary output in which when they declare the points as anomalous versus non-anomalous. Given a data set D, the problem statement would be to find all data points which are anomalous. Several anomaly detection algorithm give an anomaly score. In that case, we can define the problem as given a data set, find all data points with anomaly score greater than some threshold, or we can find top and largest anomaly score data points or given a training data which consists mostly of the normal points, we find the anomaly score of a test point with respect to D. To show with an example, we find anomaly in a time series data. The data set that we use is sunspots.txt, which consist of index of the activity of the entire visible disc of the sun and spot counts averaged over different time interval. First, we view the data as a table, which is shown here. We have months and the sunspots. Now to detect anomaly in this time series data, we use moving average. For this first, we designed a moving average function which uses a window size parameter, that is, all the points which are nine previous instances or 10 previous instances depending on how much the window size is our average is to find the moving average, that we designed function for exploring anomaly value. For this, if the data value that we are considering is different from the moving average by some number, which is defined as sigma times the standard deviation, which could be on the either side. That is, moving average plus sigma times standard deviation or moving average minus sigma times the standard deviation. Any points that lie within this range is considered normal. If it lies outside, that is considered anomalous. Here, different values of Sigma gives different results. For example, if we set sigma equals one, the green part are all the time series data. From the moving average, we find the anomalies which are shown as red. Since the sigma value we have taken to be very small, the range of normal behavior is restricted, so a lots of points have been declared anomalies. Here, we change the value of sigma to two, and only those ones which are far away are now anomalous. If we increase it further, to say three or four, only very few points which are truly different from the trend are declared as anomalies. In this video, we discussed the anomaly detection problem and see different aspects of it. In a Python example, we see how we can detect anomalies from a time-series data. Thank you.