WWWページからの手順に関する箇条書きの抽出

武智, 峰樹; 徳永, 健伸; 松本, 裕治; 田中, 穂積; Mineki, Takechi; Takenobu, Tokunaga; Yuji, Matsumoto; Hozumi, Tanaka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

WWWページからの手順に関する箇条書きの抽出

https://ipsj.ixsq.nii.ac.jp/records/17586

名前 / ファイル	ライセンス	アクション
IPSJ-TOD4412007.pdf (283.7 kB)	Copyright (c) 2003 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2003-09-15

タイトル

WWWページからの手順に関する箇条書きの抽出

タイトル

言語

タイトル

Extracting Lists of Procedural Expressions from Web Pages

言語

jpn

キーワード

主題Scheme

Other

主題

研究論文

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

富士通株式会社／奈良先端科学技術大学院大学情報科学研究科

著者所属

東京工業大学情報理工学研究科

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属

東京工業大学情報理工学研究科

著者所属(英)

FUJITSU LIMITED/Graduate School of Information Science, Nara Institute of Science and Technology

著者所属(英)

Graduate School of Information Science and Engineering, Tokyo Institute of Technology

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者所属(英)

Graduate School of Information Science and Engineering, Tokyo Institute of Technology

著者名

武智, 峰樹

著者名(英)

Mineki, Takechi

論文抄録

内容記述タイプ

Other

内容記述

要素技術としての文書分類は，質問応答やWeb ナビゲーションにおける主要な構成要素である．特に表層的なテキストの特徴を主に利用する質問応答では，与えられた質問のタイプに応じて適切な回答候補を抽出できる分類エンジンが重要である．またWeb ナビゲーションにおいては，従来の質問応答が扱ってこなかった質問も扱う必要があり，そのような質問に対しても適切な回答候補を選び出すための分類技術が求められる．本研究は，Web ナビゲーションが扱う質問のうち，特に手順に関する質問を取り上げ，その回答候補の分類に有効な特徴量を明らかにすることを目的とする．その試みとしてWeb ページにおいてHTML のリストタグが付与されたテキストを記事集合として，それを手順について書かれたテキストとそれ以外のテキストに分類するタスクを考える．検索エンジンを用いて箇条書きを収集し，機械学習の一手法であるSupport Vector Machine を用いた文書分類を行い，その結果の観察に基づいて手順について書かれた箇条書きの抽出に有効な特徴量を考察した．N-gram や語の頻度情報をベースにした手法により，コンピュータ分野に関しては90%以上の精度で分類可能な特徴量の組合せを得た．

論文抄録(英)

内容記述タイプ

Other

内容記述

Text categorization is an essential component to allow for efficient navigation techniques and to get query-relevant information on the Web. Especially in the context of Question-Answering, it requires the right features to categorize the documents and to allow for efficient knowledge acquisition according to the types of queries. In the queries addressed in such navigation, we focus on those asking for procedural knowledge and aim at clarifying the specification of the answers. To solve this problem we exploit procedural descriptions in the form of itemized expressions tagged with the HTML list tags. Applying Support Vector Machines to the set of list expressions gathered from WWW by a search engine, we examine the obtained model in order to to find the relevant features for the extraction of an answer that explains relevant procedures. By exploiting the features based on word frequencies, such as N-gram and the sequences of words, we obtained a feature set for a computer domain that can categorize more than 90% in recall and precision.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464847

書誌情報

情報処理学会論文誌データベース（TOD）

巻 44, 号 SIG12(TOD19), p. 51-63, 発行日 2003-09-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7799

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 23:14:39.966100

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

WWWページからの手順に関する箇条書きの抽出

× 武智, 峰樹

× Mineki, Takechi

Versions

Share

Cite as

エクスポート