大規模テストコレクション構築について：NTCIR - 1の訓練用検索課題の分析

栗山, 和子; 神門, 典子; Kuriyama, Kazuko; Kando, Noriko

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

大規模テストコレクション構築について：NTCIR - 1の訓練用検索課題の分析

https://ipsj.ixsq.nii.ac.jp/records/43220

名前 / ファイル	ライセンス	アクション
IPSJ-DD99019006.pdf (1.1 MB)	Copyright (c) 1999 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

1999-07-16

タイトル

大規模テストコレクション構築について：NTCIR - 1の訓練用検索課題の分析

タイトル

言語

タイトル

Construction of a Large Scale Test Collection : Analysis of the Training Topics of the NTCIR - 1

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

学術情報センター　研究開発部

著者所属

学術情報センター　研究開発部

著者所属(英)

R & Dept., National Center for Science Information Systems (NACSIS)

著者所属(英)

R & Dept., National Center for Science Information Systems (NACSIS)

著者名

栗山, 和子

著者名(英)

Kuriyama, Kazuko

論文抄録

内容記述タイプ

Other

内容記述

本稿では、評価用ツールとしてのテストコレクションにおける検索課題の性質について考察する。検索課題の望ましい性質として、「自然さ」と「難易度のバランス」があげられる。自然さとは、検索課題の内容が現実の検索過程においてシステムに与えられる検索要求と同様に自然なものでなければならないということである。「難易度のバランス」とは、検索課題が易しいすぎるものばかりでも難しすぎるものばかりでも、テストコレクション全体の性質が偏ったものになるので、難易度のバランスがとれているのが望ましいということである。NTCIR-1では、検索課題を自然なものとするため、分野の研究者から収集している。本稿では、検索課題の難易度について、NTCIR-1の訓練用検索課題を用いて、検索課題そのものについて分析し、予備テストの評価結果との関連を調べた。その結果、検索課題ごとの平均精度の中央値と、検索課題中の検索要求文の文字数、検索要求文中の単語がの出現する正解文書数、機能分類のそれぞれとには明らかな関連性は見られないものの、機能分類によるグループ分けは検索課題の難易度を予測するために、ある程度の参考になることがわかった。また、提出結果の平均精度の度数分布から、易しい検索課題、難しい検索課題というグループ分けの中でもその性質は一様ではないことがわかった。

論文抄録(英)

内容記述タイプ

Other

内容記述

The paper discusses the quality of search topics in test collections, which are used in laboratory-typed testing of information retrieval systems. As a tool for evaluation, search topics in a test collection should be "natural" as search requests submitted by actual users, and balance their "difficulty". In the NACSIS Test Collection for Information Retrieval Systems 1 (NTCIR-1), search topics were collected from researchers of the subject domains, i. e., actual users of the systems providing access to scientific documents like NTICR-1, in order to prepare "natural" requests as much as possible. To estimate "difficulty" of topics, we analysed the results of the pretest, which used the training topics of the NTCIR-1. As results; (1) the average precision of each topic had no explicit relation with the number of characters/words/phrases, the number of relevant documents contain the words in the description, nor the number of relevant documents of each topic; (2) the search function needed to conduct search of the topic sometimes found effective to estimate "difficulty" of the topic; (3) the distribution of average precision over systems revealed that the nature of the topics were heterogeneous in a group of "easy topics" or "difficult topics".

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10539261

書誌情報

情報処理学会研究報告デジタルドキュメント（DD）

巻 1999, 号 57(1999-DD-019), p. 41-48, 発行日 1999-07-16

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 11:04:11.561735

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

大規模テストコレクション構築について：NTCIR - 1の訓練用検索課題の分析

× 栗山, 和子

× Kuriyama, Kazuko

Versions

Share

Cite as

エクスポート