nグラム統計によるコーパスからの未知語抽出

森, 信介; 長尾, 眞; Shinsuke, Mori; Makoto, Nagao

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

nグラム統計によるコーパスからの未知語抽出

https://ipsj.ixsq.nii.ac.jp/records/13000

名前 / ファイル	ライセンス	アクション
IPSJ-JNL3907007.pdf (937.7 kB)	Copyright (c) 1998 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

1998-07-15

タイトル

nグラム統計によるコーパスからの未知語抽出

タイトル

言語

タイトル

Unknown Word Extraction from Corpora Using n - gram Statistics

言語

jpn

キーワード

主題Scheme

Other

主題

論文

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

その他タイトル

その他のタイトル

自然言語処理

著者所属

日本アイ・ビー・エム株式会社東京基礎研究所

著者所属

京都大学

著者所属(英)

Tokyo Research Laboratory, IBM Research

著者所属(英)

Kyoto University

著者名

森, 信介

著者名(英)

Shinsuke, Mori

論文抄録

内容記述タイプ

Other

内容記述

自然言語処理において，辞書は単語の文法的機能や意味の情報源として必要不可欠であり，辞書に登録されていない単語を減少させるため，辞書の語彙を増強する努力がなされている．新語や専門用語は絶えず増え続けているため，辞書作成の作業は多大な労力を要するのみならず，各解析段階での未知語との遭遇は避けらず，大きな問題の1つとなっている．この問題を解決するため，本論文では，nグラム統計を用いて，コーパスからの単語の抽出とその単語が属する品詞の推定を同時に行う方法を提案する．この方法は，同一品詞に属する単語の前後に位置する文字列の分布は類似するという仮定に基づく．実験の結果，本手法が未知語の品詞推定や辞書構築に有効であることが確認された．

論文抄録(英)

内容記述タイプ

Other

内容記述

Dictionaries are indispensable for NLP as a source of information of grammatical functions or meanings of words.Much endeavor is being made to reinforce their vocabulary.Given continuous increase of new words or technical terms,building a dictionary takes vast effort and unknown words are inevitable at any step of analysis and this causes a grand problem.To solve this problem,we propose a method to extract words from a corpus and estimate part-of-speeches(POSs)which they belong to simultaneously using n-gram statistics,based on the supposition that distributions of strings preceding or following words belonging to the same POS are similar.Experiments have shown that this method is effectiveto infer the POS of unknown words and build a dictionary.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 39, 号 7, p. 2093-2100, 発行日 1998-07-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-23 01:24:15.819839

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

nグラム統計によるコーパスからの未知語抽出

× 森, 信介

× Shinsuke, Mori

Versions

Share

Cite as

エクスポート