2024-03-29T22:17:14Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:000486592024-03-29T05:26:34Z01164:04179:04236:04241
コーパスを用いた概念ベース拡張方式A method of representing an unknown concept of a word in the "Gainen - Base" from a corpusjpnhttp://id.nii.ac.jp/1001/00048659/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=48659&item_no=1&attribute_id=1&file_no=1Copyright (c) 2000 by the Information Processing Society of JapanNTTコミュニケーション科学基礎研究所NTTコミュニケーション科学基礎研究所NTTシステムインテグレーション基盤研究所稲子, 希望笠原, 要松澤, 和光単語間の類似性判別を目的とし,単語の意味を表す概念の知識ベースである「概念ベース」の研究を進めている.すでに国語辞典より4万の日常語の概念を自動抽出した概念ベースを構築し,類似性判別や情報検索等へ適用している.この概念ベースの適用範囲を広げるために,扱える概念数を拡張する必要がある.本研究では,概念ベースに含まれない語(未対応語)の概念を推定する方法を提案する.具体的には,未対応語を含むテキストコーパスにおける単語間の共起情報を利用して,概念ベースとテキストコーパスに共通する語彙の中から未対応語に類似する語を選択し,概念ベース中の類似語の概念を用いて推定する.We have studied how to make the "Gainen-Base," a knowledge base of word concepts, from dictionaries automatically. It can simulate judgment of the semantic similarity between daily-used words. The Gainen-Base can be applied to word-related information processing such as text retrieval. For the variety of applications of the Gainen-Base, it is important to represent concepts of words even when definitions of the words are not written in dictionaries. In this paper, we propose a method of estimating a word concept which is not included in the Gainen-Base. Our method consists of two steps. First, similar words to the word are retrieved by employing knowledge bases of word co-occurrence which can be extracted from text corpola and one of which are newly proposed. Next, the unknown concept is estimated from the concepts of the retrieved words represented in the Gainen-Base.AN10115061情報処理学会研究報告自然言語処理(NL)200029(1999-NL-136)41482000-03-212009-06-30