WEKO3
アイテム
Experiment of Document Clustering by Triple-pass Leader-follower Algorithm without Any Information on Threshold of Similarity
https://ipsj.ixsq.nii.ac.jp/records/94318
https://ipsj.ixsq.nii.ac.jp/records/94318ad7203f4-1e22-48c6-9bd7-2059a437b35f
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2013 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2013-07-15 | |||||||
タイトル | ||||||||
タイトル | Experiment of Document Clustering by Triple-pass Leader-follower Algorithm without Any Information on Threshold of Similarity | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | Experiment of Document Clustering by Triple-pass Leader-follower Algorithm without Any Information on Threshold of Similarity | |||||||
言語 | ||||||||
言語 | eng | |||||||
キーワード | ||||||||
主題Scheme | Other | |||||||
主題 | 文書分類・行動パターン抽出 | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
School of Library and Information Science, Keio University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
School of Library and Information Science, Keio University | ||||||||
著者名 |
Kazuaki, Kishida
× Kazuaki, Kishida
|
|||||||
著者名(英) |
Kazuaki, Kishida
× Kazuaki, Kishida
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | The number of clusters has to be defined a priori in most clustering algorithms, but it is usually unknown in situations to which document clustering is applied. Therefore, it would be convenient if a clustering algorithm could be executed without any information on the number of clusters. This article attempts to develop such an algorithm by extending the leader-follower clustering algorithm, which is appropriate for the clustering of large-scale datasets. Specifically, a threshold value required for executing the leader-follower clustering algorithm is automatically estimated from some pairs of documents by scanning the document file one time before executing the standard leaderfollower algorithm. In particular, the triple-pass algorithm in which cluster vectors are generated in the second scan and each document is allocated to the most similar cluster in the third scan is proposed. The experimental result suggests that the triple-pass leader-follower clustering algorithm is sufficiently effective and comparable with the hierarchical Dirichlet process (HDP) mixture model and with the spherical k-means algorithm with automatically estimating the number of clusters based on the cover-coefficient. The algorithm requires less computational iteration than the other two methods, and is thus cost effective. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | The number of clusters has to be defined a priori in most clustering algorithms, but it is usually unknown in situations to which document clustering is applied. Therefore, it would be convenient if a clustering algorithm could be executed without any information on the number of clusters. This article attempts to develop such an algorithm by extending the leader-follower clustering algorithm, which is appropriate for the clustering of large-scale datasets. Specifically, a threshold value required for executing the leader-follower clustering algorithm is automatically estimated from some pairs of documents by scanning the document file one time before executing the standard leaderfollower algorithm. In particular, the triple-pass algorithm in which cluster vectors are generated in the second scan and each document is allocated to the most similar cluster in the third scan is proposed. The experimental result suggests that the triple-pass leader-follower clustering algorithm is sufficiently effective and comparable with the hierarchical Dirichlet process (HDP) mixture model and with the spherical k-means algorithm with automatically estimating the number of clusters based on the cover-coefficient. The algorithm requires less computational iteration than the other two methods, and is thus cost effective. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AN10112482 | |||||||
書誌情報 |
研究報告データベースシステム(DBS) 巻 2013-DBS-157, 号 23, p. 1-6, 発行日 2013-07-15 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |