リンク情報の利用によるWeb 検索性能の改善

正田備也; 高須, 淳宏; 安達, 淳; Tomonari, Masada; Atsuhiro, Takasu; Jun, Adachi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

リンク情報の利用によるWeb 検索性能の改善

https://ipsj.ixsq.nii.ac.jp/records/17511

名前 / ファイル	ライセンス	アクション
IPSJ-TOD4608007 (573.3 kB)	Copyright (c) 2005 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2005-06-15

タイトル

リンク情報の利用によるWeb 検索性能の改善

タイトル

言語

タイトル

Improving Web Search Performance with Hyperlink Information

言語

jpn

キーワード

主題Scheme

Other

主題

研究論文（論文賞受賞）

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

国立情報学研究所

著者所属

国立情報学研究所

著者所属

国立情報学研究所

著者所属(英)

National Institute of Informatics

著者所属(英)

National Institute of Informatics

著者所属(英)

National Institute of Informatics

著者名

正田備也
高須, 淳宏
安達, 淳

著者名(英)

Tomonari, Masada
Atsuhiro, Takasu
Jun, Adachi

論文抄録

内容記述タイプ

Other

内容記述

本研究は，リンク情報を利用してWeb 検索性能を向上させる効果的な手法に関する研究である．まず，新しいクラスタリング・アルゴリズムを提案する．このアルゴリズムは，同じサイトに属するWeb ページを結ぶハイパーリンクだけを利用し，出次数の多いWeb ページが異なるクラスタに分散するようなクラスタリングを実現する．これによって，同じクラスタ内でテキスト情報の均一性が適度に確保されることを狙っている．なぜなら，出次数が多いWeb ページをたくさん経由するほど，Web ページのテキスト内容が発散しやすいと考えられるからである．本研究では，この仮説を，提案のクラスタリング・アルゴリズムがWeb 検索の性能向上に寄与するかどうかを確認することで，検証する．そこで，提案のアルゴリズムによって得られたクラスタを利用し，各Web ページのテキスト情報をもとに算出された文書ベクトルのエントリを変更する．文書ベクトルは，代表的な単語重み付けスキーマであるTF-IDF によって計算され，文書ベクトルのエントリの変更は，金沢らによって提案されたRS モデルに基づいて行われる．本研究では，検索性能を客観的に評価するため，NTCIR-3Web 検索タスクのために準備された文書データと検索質問を，評価実験に用いた．実験の結果によれば，ワン・クリック・ディスタンス文書モデルの下で，クラスタリングの結果を用いない場合に比べて，検索性能を表す重要な指標である平均適合率が10%以上上昇した．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper concerns an efficient method for improving Web search performance with hyperlink information. We provide a new Web page clustering algorithm. Our algorithm only uses intra-site hyperlinks and constructs clusters so that the Web pages of large out-degree belong to different clusters. We expect our algorithm to provide clusters such that the Web pages in the same clusters are similar to each other by their textual contents. This algorithm is based on a hypothesis that the textual contents of Web pages tend to drift further after passing through more Web pages of larger out-degree. In this paper, we test this hypothesis by checking if our clustering algorithm can improve the performance of Web search. We use clustering results our algorithm gives and modify entries of document vectors. Document vectors are computed with a well-known term weighting scheme, TF-IDF. The vector entry modification is based on RS (relevance superimposition) model invented by Kanazawa et al. We conducted evaluative experime ts by using document sets and query sets prepared for NTCIR-3 Web retrieval task and realized an objective evaluation. The results show that when we use one-click-distance document model, we can improve the average precision, an important measure for Web search performance, on the order of more than 10% in comparison with the case where we use no clustering results.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464847

書誌情報

情報処理学会論文誌データベース（TOD）

巻 46, 号 SIG8(TOD26), p. 48-59, 発行日 2005-06-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7799

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 06:29:14.114842

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

リンク情報の利用によるWeb 検索性能の改善

× 正田備也

× 高須, 淳宏

× 安達, 淳

× Tomonari, Masada

× Atsuhiro, Takasu

× Jun, Adachi

Versions

Share

Cite as

エクスポート