HTMLの木構造を利用した条件付確率場による固有表現分類: Wikipedia からのシソーラス半自動構築

渡邉, 陽太郎; 浅原, 正幸; 松本, 裕治; Yotaro, Watanabe; Masayuki, Asahara; Yuji, Matsumoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

HTMLの木構造を利用した条件付確率場による固有表現分類: Wikipedia からのシソーラス半自動構築

https://ipsj.ixsq.nii.ac.jp/records/47835

名前 / ファイル	ライセンス	アクション
IPSJ-NL07179013.pdf (602.8 kB)	Copyright (c) 2007 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2007-05-25

タイトル

HTMLの木構造を利用した条件付確率場による固有表現分類: Wikipedia からのシソーラス半自動構築

タイトル

言語

タイトル

Named Entity Categorization Using Conditional Random Fields on HTML Tree Structure: Semi-Automatic Thesaurus Construction from Wikipedia

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者所属(英)

Graduate School of Information Science, Nara Institute of Science and Technology

著者名

渡邉, 陽太郎浅原, 正幸松本, 裕治

著者名(英)

Yotaro, Watanabe Masayuki, Asahara Yuji, Matsumoto

論文抄録

内容記述タイプ

Other

内容記述

本稿では，Wikipedia 内に出現する固有表現を獲得し，精度よく分類する手法を提案する．Wikipediaの記事に出現するアンカーテキストの単語および句は，リンク先の記事に語釈が記述されている．このWikipedia の特性を用いて，我々は，固有表現の分類問題を固有表現を表すアンカーテキストに対するラベル付与問題として定式化する．まず，アンカーテキストをノードとして定義されるグラフを構成する．次に，グラフにHTML の構造を取り入れるため，HTML のDOM 構造に基づく3 種類のエッジを導入する．このようにして構成したグラフのノードに対するラベル付与を教師あり学習器であるConditional Random Fields (CRFs) を用いて行う．しかし，構成したグラフは閉路を含むため，CRFs の正確な演算を行うことは計算量が大きく困難である．そこで，Tree-based Reparameterization (TRP) を用いて近似的に演算をおこなう手法を導入する．実施した評価実験において，提案手法が２つ組に対するSupport Vector Machines の順次適用による手法と比較して高い精度で固有表現の分類ができたことを報告する．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of catego-rizing anchor texts with linked HTML texts which glosses a named entity. Using this representation,we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate HTML structure on the graph, three types of cliques are de ned based on the HTML DOM structure.We propose a method with Conditional Random Fields (CRFs) to categorize the nodes on the graph.Since the de ned graph include cycles, the exact inference of CRFs is computationally expensive. We introduce an approximate inference method using Tree-based Reparameterization (TRP) to reduce computational cost. Experimental results show that the proposed method outperforms a baseline method that uses Support Vector Machines.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

情報処理学会研究報告自然言語処理（NL）

巻 2007, 号 47(2007-NL-179), p. 73-78, 発行日 2007-05-25

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 08:48:34.739809

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

HTMLの木構造を利用した条件付確率場による固有表現分類: Wikipedia からのシソーラス半自動構築

× 渡邉, 陽太郎浅原, 正幸松本, 裕治

× Yotaro, Watanabe Masayuki, Asahara Yuji, Matsumoto

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

HTMLの木構造を利用した条件付確率場による固有表現分類: Wikipedia からのシソーラス半自動構築

× 渡邉, 陽太郎 浅原, 正幸 松本, 裕治

× Yotaro, Watanabe Masayuki, Asahara Yuji, Matsumoto

Versions

Share

Cite as

エクスポート

× 渡邉, 陽太郎浅原, 正幸松本, 裕治