統計的潜在的意味空間の抽出

川前, 徳章; 青木, 輝勝; 安田, 浩; Noriaki, Kawamae; Terumasa, Aoki; Hiroshi, Yasuda

WEKO3

インデックスツリー

RootNode

アイテム

統計的潜在的意味空間の抽出

https://ipsj.ixsq.nii.ac.jp/records/48430

名前 / ファイル	ライセンス	アクション
IPSJ-NL01148004.pdf (1.1 MB)	Copyright (c) 2002 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2002-03-04

タイトル

統計的潜在的意味空間の抽出

タイトル

言語

タイトル

Extraction of Statistical Latent Semantic Space

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京大学先端科学技術研究センター

著者所属

東京大学先端科学技術研究センター

著者所属

東京大学先端科学技術研究センター

著者所属(英)

Research Center for Advanced Research and Technology The University of Tokyo

著者所属(英)

Research Center for Advanced Research and Technology The University of Tokyo

著者所属(英)

Research Center for Advanced Research and Technology The University of Tokyo

著者名

川前, 徳章青木, 輝勝安田, 浩

著者名(英)

Noriaki, Kawamae Terumasa, Aoki Hiroshi, Yasuda

論文抄録

内容記述タイプ

Other

内容記述

本研究は統計的潜在的意味のインデキシング（SLSI）という新しい数学的アプローチを提案する。提案手法は文書だけでなく、文書内に出現した単語も同時に潜在的意味空間に配置することができ、その空間においてインデキシングを行える。これは潜在的意味が文書に出現した単語よりも文書をインデキシングできるためである。特異値分解に基づいたLSIやそれを発展させたPLSIとSLSIの相違点はずっと意味のあり因子分析と情報理論に基づいた堅固な統計モデルをもっていることである。それゆえSLSIやPLSIで未解決だったいくつかの問題点を解消することができた。テストコレクションについてこの実験を行った結果、SLSIはLSIやPLSIよりも精度が良かった。加えてエントロピーに基づいた単語の重み付けを提案し、これを利用した結果、我々は事前に重要な単語を判断し、その結果文書中に出現した全単語から最小限必要な単語を選択することができる。従って、この手法は計算コストの減少を実現する事を可能とした。

論文抄録(英)

内容記述タイプ

Other

内容記述

The main goal of this paper is to propose Statistical Latent Semantic Indexing(SLSI) that is a novel statistical approach to simultaneously ma documents and terms into a latent semantic space. This is because latent semantics of the documents fits to categorize the documents than indexing terms in the documents. In contrast to Latent Semantic Indexing(LSI) based on Singular Value Decomposition (SVD) and Probabilistic Latent Semantic Indexing (PLSI), SLSI has a more meaningful and solid statistical model that is based on a factor analysis and information theory. Therefore, this model can solve the remained critical problems in LSI and PLSI. Experimental results with a number of a test collection show that SLSI is much better than LSI and PLSI in viewpoints of retrieval. Moreover, we propose a new term weighting method based on entropy. By this method we can judge which terms are important, and can extract only minimum essential terms from them. As a result, this method makes it possible to reduce calculation cost.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

情報処理学会研究報告自然言語処理（NL）

巻 2002, 号 20(2001-NL-148), p. 25-30, 発行日 2002-03-04

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 08:30:16.808634

Show All versions

Cite as

安田, 浩, 2002: 情報処理学会, 25–30 p.

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

統計的潜在的意味空間の抽出

× 川前, 徳章青木, 輝勝安田, 浩

× Noriaki, Kawamae Terumasa, Aoki Hiroshi, Yasuda

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

統計的潜在的意味空間の抽出

× 川前, 徳章 青木, 輝勝 安田, 浩

× Noriaki, Kawamae Terumasa, Aoki Hiroshi, Yasuda

Versions

Share

Cite as

エクスポート

× 川前, 徳章青木, 輝勝安田, 浩