ラベルありデータの選択バイアスに頑健な半教師あり学習

藤野, 昭典; 上田, 修功; 永田, 昌明; Akinori, Fujino; Naonori, Ueda; Masaaki, Nagata

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

ラベルありデータの選択バイアスに頑健な半教師あり学習

https://ipsj.ixsq.nii.ac.jp/records/73755

名前 / ファイル	ライセンス	アクション
IPSJ-TOM0402005.pdf (518.6 kB)	Copyright (c) 2011 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2011-03-28

タイトル

ラベルありデータの選択バイアスに頑健な半教師あり学習

タイトル

言語

タイトル

Robust Semi-supervised Learning for Labeled Data Selection Bias

言語

jpn

キーワード

主題Scheme

Other

主題

オリジナル論文

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

日本電信電話株式会社NTTコミュニケーション科学基礎研究所

著者所属

日本電信電話株式会社NTTコミュニケーション科学基礎研究所

著者所属

日本電信電話株式会社NTTコミュニケーション科学基礎研究所

著者所属(英)

NTT Communication Science Laboratories, NTT Corporation

著者所属(英)

NTT Communication Science Laboratories, NTT Corporation

著者所属(英)

NTT Communication Science Laboratories, NTT Corporation

著者名

藤野, 昭典上田, 修功永田, 昌明

著者名(英)

Akinori, Fujino Naonori, Ueda Masaaki, Nagata

論文抄録

内容記述タイプ

Other

内容記述

本論文では，自動分類の対象となるテストデータ集合と分布が大きく異なるラベルありデータ集合から汎化性能が高い分類器を設計するための頑健な半教師あり学習法を提案する．半教師あり学習の枠組みの 1 つである JESS-CM 法は複数の自然言語処理タスクで最良の結果を達成したが，本論文で扱うタスク設定ではラベルありデータに過適合する危険性がある．提案法では，分類器を構成する識別・生成モデルの双方の学習にテストデータ集合と分布が類似するラベルなしデータ集合をラベルありデータ集合と同時に用いることで過適合の問題を解決することを期待する．また，提案法の学習アルゴリズムと条件付き確率モデルを単一の目的関数を用いて定式化する．3 つの代表的なテストコレクションを用いたテキスト分類実験により，本タスク設定のほとんどの場合で，提案法では JESS-CM 法よりも高い分類性能を得られることを確認した．また，ラベルなしデータの選択に基づく手法と提案法を組み合わせることの効果を実験的に確認した．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper presents a robust semi-supervised learning method for designing good classifiers with a high generalization ability from a labeled dataset whose distribution differs largely from that of a target test dataset. Although JESS-CM is one of the most successful semi-supervised learning methods that achieved the best published results in natural language processing tasks, it has an overfitting problem in the task setting we consider in this paper. We expect the proposed method to solve the overfitting problem by utilizing an unlabeled dataset, whose distribution is similar to that of the target test dataset, with the labeled data set for both training of discriminative and generative models composing a classifier. We formulate the training algorithm and conditional probability model by defining a single objecive function. Our experimental results for text classification using three typical test collections confirmed that the classification performance obtained with the proposed method was better than that of JESS-CM in most cases of the task setting. We also confirmed experimentally the effect of combining the proposed method with an unlabeled data selecting approach.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464803

書誌情報

情報処理学会論文誌数理モデル化と応用（TOM）

巻 4, 号 2, p. 31-42, 発行日 2011-03-28

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7780

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 21:49:29.228141

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

ラベルありデータの選択バイアスに頑健な半教師あり学習

× 藤野, 昭典上田, 修功永田, 昌明

× Akinori, Fujino Naonori, Ueda Masaaki, Nagata

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

ラベルありデータの選択バイアスに頑健な半教師あり学習

× 藤野, 昭典 上田, 修功 永田, 昌明

× Akinori, Fujino Naonori, Ueda Masaaki, Nagata

Versions

Share

Cite as

エクスポート

× 藤野, 昭典上田, 修功永田, 昌明