隣接文節間の係り受け情報に着目した話し言葉のチャンキングの評価

西光, 雅弘; 河原, 達也; 高梨, 克也; Masahiro, Saikou; Tatsuya, Kawahara; Katsuya, Takanashi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

隣接文節間の係り受け情報に着目した話し言葉のチャンキングの評価

https://ipsj.ixsq.nii.ac.jp/records/56902

名前 / ファイル	ライセンス	アクション
IPSJ-SLP06061004.pdf (517.2 kB)	Copyright (c) 2006 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2006-05-11

タイトル

隣接文節間の係り受け情報に着目した話し言葉のチャンキングの評価

タイトル

言語

タイトル

A Evaluation of a Cascaded Chunking Method of Spontaneous Japanese using Local Bunsetsu Dependency

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

京都大学情報学研究科知能情報学専攻

著者所属

京都大学情報学研究科知能情報学専攻

著者所属

京都大学情報学研究科知能情報学専攻

著者所属(英)

School of lnformatics, Kyoto University

著者所属(英)

School of lnformatics, Kyoto University

著者所属(英)

School of lnformatics, Kyoto University

著者名

西光, 雅弘
河原, 達也
高梨, 克也

著者名(英)

Masahiro, Saikou
Tatsuya, Kawahara
Katsuya, Takanashi

論文抄録

内容記述タイプ

Other

内容記述

会議録作成支援や字幕付与などの音声言語処理を指向して、話し言葉を“適当な”単位に区分化することを考える。従来、話し言葉音声では、ポーズに基づいて発話単位を設定することが多いが、ポーズが文や節の境界と対応しない場合が多く、均質な言語的まとまりにならない。一方、話し言葉の節や文の境界を機械学習に基づいて検出する方法も研究されているが、音声認識結果に対してはＦ値が70％台であり、誤検出箇所に関して意味づけを見いだすのが難しい。これに対して本研究では、話し言葉の非定型性や音声認識誤りに頑健であると考えられる局所的な特徴、具体的には隣接文節間の係り受けに着目して、チャンキングを行う。述語判定や係り受けタイプ判定を組み合わせることにより、文の主題や述語・格要素におおむね対応する「構成要素」を抽出する。「日本語話し言葉コーパス」(CSJ)で分析･評価を行った結果、隣接文節間に絞ることで係り受け解析が高い精度でできること、構成要素に基づいて音声認識結果に対してもより頑健に節境界を検出できることが示された。

論文抄録(英)

内容記述タイプ

Other

内容記述

The paper addresses chunking of spontaneous Japanese oriented for speech summarization and closed caption generation. Conventionally, inter-pause unit(IPU) has been a basic unit in annotation and processing of spontaneous speech，but pauses are not necessarily related with sentence or clause boundaries. On the other hand, automatic detection of sentence or clause boundaries based on machine learning techniques is not so reliable for speech recognition results. For robust chunking against ill-formedness in spontaneous speech and speech recognition errors，we focus on local bunsetsu dependencies. Combined with detection of predicates and classification of dependencies，we define a unit named“constituents”, which apparently corresponds to subjects, predicates and case frames. With analysis and evaluation using the Corpus of Spontaneous Japanese (CSJ)，it is shown that we can perform more reliable dependency structure analysis by focusing on adjacent bunsetsus and more robust detection of clause boundaries by extracting constituents．

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

情報処理学会研究報告音声言語情報処理（SLP）

巻 2006, 号 40(2006-SLP-061), p. 19-24, 発行日 2006-05-11

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 04:45:24.732646

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

隣接文節間の係り受け情報に着目した話し言葉のチャンキングの評価

× 西光, 雅弘

× 河原, 達也

× 高梨, 克也

× Masahiro, Saikou

× Tatsuya, Kawahara

× Katsuya, Takanashi

Versions

Share

Cite as

エクスポート