文脈情報を使用した略語の自動復元

寺田, 昭; 徳永, 健伸; Terada, Akira; Tokunaga, Takenobu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

文脈情報を使用した略語の自動復元

https://ipsj.ixsq.nii.ac.jp/records/48493

名前 / ファイル	ライセンス	アクション
IPSJ-NL01144006.pdf (1.2 MB)	Copyright (c) 2001 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2001-07-16

タイトル

文脈情報を使用した略語の自動復元

タイトル

言語

タイトル

Automatic disabbreviation by using context information

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京工業大学大学院情報理工学研究科

著者所属

東京工業大学大学院情報理工学研究科

著者所属(英)

Department of Computer Science Tokyo Institute of Technology

著者所属(英)

Department of Computer Science Tokyo Institute of Technology

著者名

寺田, 昭

著者名(英)

Terada, Akira

論文抄録

内容記述タイプ

Other

内容記述

テキスト処理において，固有名詞，略語，頭文字などの未知語の処理は難しい問題である．未知語は，情報検索，テキストデータマイニングなどの応用システムや人間の理解に悪影響を与える．特に特定の対象領域に関するテキストでは略語は多用される．本論文では，略語の自動復元について述べる．従来の研究では，略語の復元の為の候補語の選定に辞書を使用していたが，本論文では，同じドメインのテキストで略語をできるだけ含まないようなものを知識源として使用する．候補語の中から正しい復元形を見つけるために，復元対象となる略語の文脈情報と知識源に含まれる復元形の候補語の文脈情報の類似性を使用した．文脈情報とは，語の前後に出現する単語を意味する．ベクトル空間法において，略語および候補語の近似度を文脈情報により計算し，候補語の中から正しい復元形を選定した．航空関係の10 000文書を対象に実験をおこなったところ従来法に比較して精度で約10%の改善を得た．

論文抄録(英)

内容記述タイプ

Other

内容記述

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. In particuler, abbreviations are often used in specific domains. In this paper, we propose an autmatic disabbreviation method using context infomation. In past reseach, a dictionary has conventionally been used to search abbreviation expansion candidates for an abbreviation. We use an abbreviation-poor text of the same domain instead of a dictionary. We calculate the plausibility of expansion cadidates based on the similarity between the context of a target abbreviation and that of its expansion candidates. The similarity is calculated using the vector space model, in which each vector element consists of surrounding words. Experiments using about 10,000 documents in the aviation domain showed that the proposed method is superior to past methods by 10% in precision.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

情報処理学会研究報告自然言語処理（NL）

巻 2001, 号 69(2001-NL-144), p. 39-45, 発行日 2001-07-16

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 08:28:18.141397

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

文脈情報を使用した略語の自動復元

× 寺田, 昭

× Terada, Akira

Versions

Share

Cite as

エクスポート