http://swrc.ontoware.org/ontology#Article
Example-Based Outlier Detection for High Dimensional Datasets
en
研究論文
Graduate School of Systems and Information Engineering University of Tsukuba
Graduate School of Systems and Information Engineering University of Tsukuba Center for Computational Sciences University of Tsukuba
School of Computer Science Carnegie Mellon University
Cui Zhu
Hiroyuki Kitagawa
Christos Faloutsos
Detecting outliers is an important problem in applications such as fraud detection financial analysis health monitoring and so on. It is typical of most such applications to possess high dimensional datasets. Many recent approaches detect outliers according to some reasonable pre-defined concepts of an outlier (e.g. distance-based density-based etc.). Most of these concepts are proximity-based which define an outlier by its relationship to the rest of the data. However in high dimensional space the data becomes sparse which implies that every object can be regarded as an outlier from the point of view of similarity. Furthermore a fundamental issue is that the notion of which objects are outliers typically varies between users problem domains or even datasets. In this paper we present a novel solution to this problem by detecting outliers based on user examples for high dimensional datasets. By studying the behavior of projections of such a few outlier examples in the dataset the proposed method discovers the hidden view of outliers and picks out further objects that are outstanding in the projection where the examples stand out greatly. Our experiments on both real and synthetic datasets demonstrate the ability of the proposed method to detect outliers that match users’intentions.
Detecting outliers is an important problem, in applications such as fraud detection, financial analysis, health monitoring and so on. It is typical of most such applications to possess high dimensional datasets. Many recent approaches detect outliers according to some reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). Most of these concepts are proximity-based which define an outlier by its relationship to the rest of the data. However, in high dimensional space, the data becomes sparse which implies that every object can be regarded as an outlier from the point of view of similarity. Furthermore, a fundamental issue is that the notion of which objects are outliers typically varies between users, problem domains or, even, datasets. In this paper, we present a novel solution to this problem, by detecting outliers based on user examples for high dimensional datasets. By studying the behavior of projections of such a few outlier examples in the dataset, the proposed method discovers the hidden view of outliers and picks out further objects that are outstanding in the projection where the examples stand out greatly. Our experiments on both real and synthetic datasets demonstrate the ability of the proposed method to detect outliers that match users’intentions.
AA11464847
情報処理学会論文誌データベース（TOD）
46
SIG8(TOD26)
120-129
2005-06-15
1882-7799