Anomaly detection using isolation

2017-01-31T05:06:06Z (GMT) by Liu, Fei Tony
Anomaly detection is the process of discovering unusual data patterns that are different from the majority of the data. It has been used for fraud detection in the credit card and insurance industries, as well as other applications such as intrusion detection, industrial damage detection, and medical and public health anomaly detection. Alongside predictive modelling, link analysis and cluster analysis, anomaly detection forms one of the four pillars in data mining research and applications. Anomalies are data points that are intrinsically few in number and different from other normal data. Due to these intrinsic properties, anomalies are highly susceptible to isolation. This thesis proposes the first isolation-based anomaly detectors that detect anomalies purely based on the concept of isolation. The proposed method is fundamentally different from all existing methods that determine anomalies using distance-based or density-based approaches. Isolation-based anomaly detectors estimate the susceptibility to isolation for each data point without employing any computationally expensive distance or density measures. This fundamental allows a significantly lower processing time, higher detection accuracy and the ability to detect a wider range of anomalies, such as clustered anomalies. This thesis explains how isolation-based anomaly detectors work in separating anomalies from the majority of data, even when there is a high volume of data. In addition, an extensive empirical evaluation and an investigation on high dimensional data are provided. Finally, we discuss possible extensions of this novel method, such as handling categorical data and data streams.