最大マージン原理にもとづく多重トピック文書の自動分類

賀沢, 秀人; 泉谷知範; 平, 博順; 前田, 英作; Hideto, Kazawa; Tomonori, Izumitani; Hirotoshi, Taira; Eisaku, Maeda

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

最大マージン原理にもとづく多重トピック文書の自動分類

https://ipsj.ixsq.nii.ac.jp/records/48117

名前 / ファイル	ライセンス	アクション
IPSJ-NL04163008.pdf (241.3 kB)	Copyright (c) 2004 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2004-09-16

タイトル

最大マージン原理にもとづく多重トピック文書の自動分類

タイトル

言語

タイトル

Maximum Margin Labeling for Multi - Topic Text Categorization

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

NTTコミュニケーション科学基礎研究所

著者所属

NTTコミュニケーション科学基礎研究所

著者所属

NTTコミュニケーション科学基礎研究所

著者所属

NTTコミュニケーション科学基礎研究所

著者所属(英)

NTT Communication Science Laboratories

著者所属(英)

NTT Communication Science Laboratories

著者所属(英)

NTT Communication Science Laboratories

著者所属(英)

NTT Communication Science Laboratories

著者名

賀沢, 秀人泉谷知範平, 博順前田, 英作

著者名(英)

Hideto, Kazawa Tomonori, Izumitani Hirotoshi, Taira Eisaku, Maeda

論文抄録

内容記述タイプ

Other

内容記述

本論文では，与えられたトピック集合の中から文書が該当するトピックを全て選びだす多重トピック文書の自動分類にたいして，最大マージンラベリング法と呼ぶ新しい学習手法を提案する．文書多重ラベリングにおいては，トピックの任意の組合せ（ラベル）を独立したクラスとみなした多クラス分類学習を行うことにより，より精度の高いラベリングが実現できると期待される．しかし，文書分類に代表される多重ラベリングの実問題においては，ラベルあたりのサンプル数の減少にともなう過学習が問題となり，こうした試みは実際にはなされてこなかった．提案手法では，各ラベルを高次元空間に埋め込んだ後にその空間でのマージンを最大化することにより，過学習を押え精度の良い多重ラベリングを実現する．実際に，Web文書の文書多重ラベリングを対象として，Parametric Mixture Model BoosTexter，SVM 最近傍法といった様々な種類の従来手法との比較実験をおこない，提案手法がより高精度なラベリングをより少ない訓練データで実現できることを実証した．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we address the problem of learining in multi-category document labeling. The goal of multi-category document labeling is to assign a document all the relevant categories from a given category set. The proposed learning method, Maximal Margin Labeling (MML), treats multi-category labels, as well as single-category labels, as independent classes and learns a kind of multi-class classifier on the multi-class problem. Since the number of multi-category labels are quite large in general, data sparseness becomes a serious challenge to MML. Thus we utilize a maximal margin principle in a high-dimensional space, into which all possible labels are embedded, to avoid over-fitting. MML is compared with other labeling methods, Parametric Mixture Model, BoosTexter, Support Vector Machines, and k nearest neighbors, using a collection of multi-category labeled Web pages. The results show that MML outperforms other methods and its high performace is apparent even with a small number of training samples.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

情報処理学会研究報告自然言語処理（NL）

巻 2004, 号 93(2004-NL-163), p. 53-60, 発行日 2004-09-16

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 08:39:35.208322

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

最大マージン原理にもとづく多重トピック文書の自動分類

× 賀沢, 秀人泉谷知範平, 博順前田, 英作

× Hideto, Kazawa Tomonori, Izumitani Hirotoshi, Taira Eisaku, Maeda

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

最大マージン原理にもとづく多重トピック文書の自動分類

× 賀沢, 秀人 泉谷知範 平, 博順 前田, 英作

× Hideto, Kazawa Tomonori, Izumitani Hirotoshi, Taira Eisaku, Maeda

Versions

Share

Cite as

エクスポート

× 賀沢, 秀人泉谷知範平, 博順前田, 英作