WEKO3
アイテム
How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures
https://ipsj.ixsq.nii.ac.jp/records/94307
https://ipsj.ixsq.nii.ac.jp/records/9430708543dc4-876c-45f6-99cf-5929b348822d
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2013 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2013-07-15 | |||||||
タイトル | ||||||||
タイトル | How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures | |||||||
言語 | ||||||||
言語 | eng | |||||||
キーワード | ||||||||
主題Scheme | Other | |||||||
主題 | 情報検索 | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
Microsoft Research Asia, China | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Microsoft Research Asia, China | ||||||||
著者名 |
Tetsuya, Sakai
× Tetsuya, Sakai
|
|||||||
著者名(英) |
Tetsuya, Sakai
× Tetsuya, Sakai
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | For the past few decades, ranked retrieval (e.g. web search) has been evaluated using rank-based evaluation metrics such as Average Precision and normalised Discounted Cumulative Gain (nDCG). These metrics discount the value of each retrieved relevant document based on its rank. The situation is similar with diversified search which has gained popularity recently: diversity metrics such as α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and Dfl-nDCG are also rank-based. These widely-used evaluation metrics just regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. The recently-proposed U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with state-of-the-art diversity metrics in terms of how “intuitive” they are: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list by means of the concordance test. Our results show that while Dfl-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. These results suggest that D-U and U-IA are not only more realistic than rank-based metrics but also intuitive, i.e., that they measure what we want to measure. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | For the past few decades, ranked retrieval (e.g. web search) has been evaluated using rank-based evaluation metrics such as Average Precision and normalised Discounted Cumulative Gain (nDCG). These metrics discount the value of each retrieved relevant document based on its rank. The situation is similar with diversified search which has gained popularity recently: diversity metrics such as α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and Dfl-nDCG are also rank-based. These widely-used evaluation metrics just regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. The recently-proposed U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with state-of-the-art diversity metrics in terms of how “intuitive” they are: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list by means of the concordance test. Our results show that while Dfl-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. These results suggest that D-U and U-IA are not only more realistic than rank-based metrics but also intuitive, i.e., that they measure what we want to measure. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AN10112482 | |||||||
書誌情報 |
研究報告データベースシステム(DBS) 巻 2013-DBS-157, 号 12, p. 1-6, 発行日 2013-07-15 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |