テキストマイニングのためのドメイン別単語辞書の構築方法

末永, 高志; 松永, 務; 関根, 純; 村松, 正明; Takashi, Suenaga; Tsutomu, Matsunaga; Jun, Sekine; Masaaki, Muramatsu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

テキストマイニングのためのドメイン別単語辞書の構築方法

https://ipsj.ixsq.nii.ac.jp/records/67009

名前 / ファイル	ライセンス	アクション
IPSJ-BIO09019023.pdf (197.0 kB)	Copyright (c) 2009 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2009-12-10

タイトル

テキストマイニングのためのドメイン別単語辞書の構築方法

タイトル

言語

タイトル

A Term Selection Method for Domain-oriented Thesaurus in Text Mining

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

株式会社NTTデータ技術開発本部

著者所属

株式会社NTTデータ技術開発本部

著者所属

株式会社NTTデータ技術開発本部

著者所属

東京医科歯科大学難治疾患研究所／ヒュービットジェノミクス株式会社

著者所属(英)

R&D Headquarters, NTT DATA CORPORATION

著者所属(英)

R&D Headquarters, NTT DATA CORPORATION

著者所属(英)

R&D Headquarters, NTT DATA CORPORATION

著者所属(英)

Medical Research Institute, Tokyo Medical and Dental University / Research Institute, HuBit Genomix Inc.

著者名

末永, 高志松永, 務関根, 純村松, 正明

著者名(英)

Takashi, Suenaga Tsutomu, Matsunaga Jun, Sekine Masaaki, Muramatsu

論文抄録

内容記述タイプ

Other

内容記述

企業に蓄積される文書情報の増加と共に，その有効活用に向けて，内容理解に基づく知識の集約のニーズが高まっている．知識の集約を行うテキストマイニングでは，一般に文書に出現する単語を基に意見の集計や関連文書の収集が行われている．この集計や収集に用いる単語辞書の構築にあたっては意見や分野（ドメイン）を代表する単語が選定されることが重要である．本稿では，共起の対となる単語の数と，共起する単語対と分野の関係の二つの着目点を持つ，分野を代表する単語を選定するための単語ランキング方式を提案する．具体的には，多くの単語により詳述される単語を代表的な単語とみなし，分野に起因する統計的な交互作用の効果による単語の組合せの評価を共起する単語について加算した基準を用いる方式である．新たな単語辞書を構築する作業を想定した実データによる評価実験の結果から，提案法によるランキング上位 10% に含まれる代表的な単語の数が，無作為に選定する場合に比べて 57% 増加することがわかった．さらに，単語辞書を構築する際の要件を考察し，提案法は要件を網羅的に満足するものであることを明らかにした．

論文抄録(英)

内容記述タイプ

Other

内容記述

There is a need to integrate knowledge extracted from electronic document data, since volume of the data is increasing in each company. The knowledge integration, such as survey of customer opinions and collection of relevant documents, is practically handled based on terms. Therefore, the terms contained on a text mining dictionary should be domain-oriented in the data. In this paper, we propose a term ranking method for selecting representative term by considering a statistical interaction of co-occurrence term pair in a specific domain and co-occurrence term number from a point of view that a representative term is described using a variety of other terms. Experimental results using real medical documents show that our method of term ranking performs good term ranking and representative term number included in our method's rank of top 10% increases by 57% than one in a random sampling. Additionally, our method is shown that it is well-suited for requirement in developing a domain-oriented thesaurus.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12055912

書誌情報

研究報告バイオ情報学（BIO）

巻 2009-BIO-19, 号 23, p. 1-6, 発行日 2009-12-10

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 00:46:09.045569

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

テキストマイニングのためのドメイン別単語辞書の構築方法

× 末永, 高志松永, 務関根, 純村松, 正明

× Takashi, Suenaga Tsutomu, Matsunaga Jun, Sekine Masaaki, Muramatsu

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

テキストマイニングのためのドメイン別単語辞書の構築方法

× 末永, 高志 松永, 務 関根, 純 村松, 正明

× Takashi, Suenaga Tsutomu, Matsunaga Jun, Sekine Masaaki, Muramatsu

Versions

Share

Cite as

エクスポート

× 末永, 高志松永, 務関根, 純村松, 正明