Explain different types of Outliers with its examples.
Types of Outliers
A very important aspect of an outlier detection technique is the nature of the desired outlier. Outlier Classification is done on the basis of their occurrence; generally, there are three kinds of outliers which are enumerated as follows:
1. Global outlier (or point anomaly):
- Observation is a global outlier if it significantly deviates from the rest of the observation in the given data set. For example, Intrusion detection in computer networks.
- In a given data set, a data object is a global outlier if it deviates significantly from the rest of the data set. Global outliers are sometimes called point anomalies and are the simplest type of outliers. Most outlier detection methods are aimed at finding global outliers.
- Global outlier detection is important in many applications. Consider intrusion detection in computer networks, for example. If the communication behavior of a computer is very different from the normal patterns (e.g., a large number of packages is broadcast in a short time), this behavior may be considered as a global outlier and the corresponding computer is a suspected victim of hacking. As another example, in trading transaction auditing systems, transactions that do not follow the regulations are considered as global outliers and should be held for further examination.
2. Contextual outlier (or conditional outlier):
- Observation is a contextual outlier if it deviates significantly based on a selected context. For example, a measure of temperature 35°F in Kathmandu may or may not be an outlier depending on the summer or winter season.
- In a given data set, a data object is a contextual outlier if it deviates significantly with respect to a specific context of the object. Contextual outliers are also known as conditional outliers because they are conditional on the selected context. Therefore, in contextual outlier detection, the context has to be specified as part of the problem definition. Generally, in contextual outlier detection, the attributes of the data objects in question are divided into two groups:
a) Contextual attributes: The contextual attributes of a data object define the object's context. In the temperature example, the contextual attributes maybe date and location.
b) Behavioral attributes: These define the object's characteristics, and are used to evaluate whether the object is an outlier in the context to which it belongs. In the temperature example, the behavioral attributes may be temperature, humidity, and pressure.
3. Collective Outliers:
A subset of data objects collectively deviates significantly from the whole data set, even if the individual data objects may not be outliers. Collective outliers are commonly found in intrusion detection such as when a number of computers keep sending denial-of-service packages to each other.
Given a data set, a subset of data objects forms a collective outlier if the objects as a whole deviate significantly from the entire data set. Importantly, the individual data objects may not be outliers.
Example 12.4 Collective outliers. In Figure 12.2, the black objects as a whole form a collective outlier ho because the density of those objects is much higher than the rest in the data set. However, every black object individually is not an outlier with respect to the whole data set.
Figure 12.2 The black objects form a collective outlier.
OR,
An outlier is a data object that deviates significantly from the rest of the data objects and behaves in a different manner. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution errors. The analysis of outlier data is referred to as outlier analysis or outlier mining.
An outlier cannot be termed as noise or error. Instead, they are suspected of not being generated by the same method as the rest of the data objects.
- Global (or Point) Outliers
- Collective Outliers
- Contextual (or Conditional) Outliers
1. Global Outliers
They are also known as Point Outliers. These are the simplest form of outliers. If, in a given dataset, a data point strongly deviates from all the rest of the data points, it is known as a global outlier. Mostly, all of the outlier detection methods are aimed at finding global outliers.
For example, In Intrusion Detection System, if a large number of packages are broadcast in a very short span of time, then this may be considered as a global outlier and we can say that that particular system has been potentially hacked.
2. Collective Outliers
As the name suggests, if in a given dataset, some of the data points, as a whole, deviate significantly from the rest of the dataset, they may be termed as collective outliers. Here, the individual data objects may not be outliers, but when seen as a whole, they may behave as outliers. To detect these types of outliers, we might need background information about the relationship between those data objects showing the behavior of outliers.
For example: In an Intrusion Detection System, a DOS (denial-of-service) package from one computer to another may be considered as normal behavior. However, if this happens with several computers at the same time, then this may be considered as abnormal behavior and as a whole they can be termed as collective outliers.
3. Contextual Outliers
They are also known as Conditional Outliers. Here, if in a given dataset, a data object deviates significantly from the other data points based on a specific context or condition only. A data point may be an outlier due to a certain condition and may show normal behavior under another condition. Therefore, a context has to be specified as part of the problem statement in order to identify contextual outliers. Contextual outlier analysis provides flexibility for users where one can examine outliers in different contexts, which can be highly desirable in many applications. The attributes of the data point are decided on the basis of both contextual and behavioral attributes.
For example A temperature reading of 40°C may behave as an outlier in the context of a “winter season” but will behave like a normal data point in the context of a “summer season”.
Comments
Post a Comment