Bi-LSTMを用いた中古日本語の文境界推定

鈴木, 理紗; 川上, 玲; カラーヌワット, タリン; 北本, 朝展; 中澤, 敏明; 苗村, 健; Lisa, Suzuki  Rei Kawakami  Tarin Clanuwat  Asanobu Kitamoto  Toshiaki Nakazawa  Takeshi Naemura

WEKO3

インデックスツリー

RootNode

アイテム

Bi-LSTMを用いた中古日本語の文境界推定

https://ipsj.ixsq.nii.ac.jp/records/208672

名前 / ファイル	ライセンス	アクション
IPSJ-CH2020003.pdf (841.7 kB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2020-12-05

タイトル

Bi-LSTMを用いた中古日本語の文境界推定

タイトル

言語

タイトル

Sentence Boundary Estimation of Ancient Japanese Using Bi-LSTM

言語

jpn

キーワード

主題Scheme

Other

主題

中古日本語; 翻刻; 校訂; 文境界推定; 深層学習

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学

著者所属

東京工業大学/デンソーアイティーラボラトリ

著者所属

人文学オープンデータ共同利用センター/国立情報学研究所

著者所属

人文学オープンデータ共同利用センター/国立情報学研究所

著者所属

東京大学

著者所属

東京大学

著者所属(英)

The University of Tokyo, Tokyo Institute of Technology/Denso IT Laboratory, ROIS-DS Center for Open Data in the Humanities/National Institute of Informatics, ROIS-DS Center for Open Data in the Humanities, National Institute of Informatics, The University of Tokyo, The University of Tokyo

著者名

鈴木, 理紗
川上, 玲
カラーヌワット, タリン
北本, 朝展
中澤, 敏明
苗村, 健

著者名(英)

Lisa, Suzuki Rei Kawakami Tarin Clanuwat Asanobu Kitamoto Toshiaki Nakazawa Takeshi Naemura

論文抄録

内容記述タイプ

Other

内容記述

古典籍・古文書の可読性を向上できれば，文学，歴史，文化から災害記録など多くの研究を加速できる．このため，機械による自動翻刻への期待がある．文字認識やかな漢字変換など処理は様々にあるが，本稿ではその中の文境界推定に取り組む．形態素を入力とし，また音声認識における現代語での文境界推定で高い性能を誇るBi-LSTMを用いて，中古日本語の文境界を推定するモデルを構築した．平安時代の文献からなるコーパスに適用し，PR曲線のAUCで 0.894と高精度な結果を得た．また，1 名の専門家からのフィードバックでも高評価を得た．

論文抄録(英)

内容記述タイプ

Other

内容記述

To improve the readability of ancient Japanese books and documents, processes such as old character recognition, punctuation, and Hiragana-Kanji conversion are required. The automation of these processes will accelerate many research areas, including literature, historical and cultural analysis, and disaster records. In this paper, we focus on sentence boundary estimation. We develop a model for estimating sentence boundaries in ancient Japanese using Bi-LSTM, which has a high performance of sentence boundary estimation in modern natural language processing for speech recognition. When applied to a corpus consisting of literature from the Heian period, the AUC of the PR curve achieved 0.894. The model was also highly evaluated by an expert.

書誌情報

じんもんこん2020論文集

巻 2020, p. 17-22, 発行日 2020-12-05

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 18:45:59.839718

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Bi-LSTMを用いた中古日本語の文境界推定

× 鈴木, 理紗

× 川上, 玲

× カラーヌワット, タリン

× 北本, 朝展

× 中澤, 敏明

× 苗村, 健

× Lisa, Suzuki Rei Kawakami Tarin Clanuwat Asanobu Kitamoto Toshiaki Nakazawa Takeshi Naemura

Versions

Share

Cite as

エクスポート