情報学広場：情報処理学会電子図書館

WEKO3

To

lat lon distance

[[sub_check.contents]]

[[sub_check.contents]]

[[sub_radio.contents]]

To

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

複合語キーワードの効率的抽出法

https://ipsj.ixsq.nii.ac.jp/records/49233

名前 / ファイル	ライセンス	アクション
IPSJ-NL94104009.pdf (1.3 MB)	Copyright (c) 1994 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

1994-11-17

タイトル

タイトル

複合語キーワードの効率的抽出法

タイトル

言語

en

タイトル

An Efficient Extraction Method for Compound Keyword

言語

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

徳島大学工学部

著者所属

徳島大学工学部

著者所属

徳島大学工学部

著者所属

徳島大学工学部

著者所属

徳島大学工学部

著者所属(英)

en

The Faculty of Engineering, The University of Tokushima

著者所属(英)

en

The Faculty of Engineering, The University of Tokushima

著者所属(英)

en

The Faculty of Engineering, The University of Tokushima

著者所属(英)

en

The Faculty of Engineering, The University of Tokushima

著者所属(英)

en

The Faculty of Engineering, The University of Tokushima

著者名

著者名(英)

Yoshitaka, Hayashi

論文抄録

内容記述タイプ

Other

内容記述

文献検索システムなどにおいて、キーワードをいかに効率良く、かつ正確に抽出するかは重要な課題である。本論文では、日本語文書においてキーワードとなることが多い複合語が、キーワード抽出の際に多大なマッチング処理を要することに着目し、複数キーワードのストリングパターンマッチングマシンの手法を応用した複合語キーワードの効率的な抽出法を提案する。本手法は、形態素解析部と複合語キーワード抽出マシンAC部、複合語キーワード候補マシンAC部からなる。14個の複合語文法構造と10個のキーワード評価ルールを定義し、26文書について実験評価を行った結果、形態素解析部を除く平均抽出速度は16．58ミリ秒、文書1KBあたり6．18ミリ秒の結果が得られ、本手法の有効性を確認した。また、抽出キーワードの選別で必要となる重なり語の抽出は、候補マシンACにより効率的に行えるので、利用者はこのマシンACに対する抽出ルールを決定することで、多種多様なキーワードを決定することが可能となる。

論文抄録(英)

内容記述タイプ

Other

内容記述

Extracting keywords efficiently is an important task in text retrieval systems. In Japanese text, there are many compound words consisting some kinds of characters (Katakana, Kanji, etc.) and the text has no delimiter among words. Therefore, extracting keywords from such a text takes a lot of time. This paper presents a technique of detecting keywords from compound keywords by introducing a set of rules, which are conditions for keywords construction. A string pattern matching machine for a finit number of patterns is applied to matching of the rules and storing keyword candidates. From the simulation results for 26 Japanese text files that the algorithm presented has performed 6.2ms/KB.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

情報処理学会研究報告自然言語処理（NL）

巻 1994, 号 104(1994-NL-104), p. 63-70, 発行日 1994-11-17

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

ja

出版者

情報処理学会

戻る

0

views

	Views

Versions

Ver.1

2025-01-22 08:07:27.356895

Show All versions

Share

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX