事前学習モデルを用いた近代文語文のニューラル機械翻訳

喜友名, 朝視顕; 平澤, 寅庄; 小町, 守; 小木曽, 智信; Tomoshige, Kiyuna; Tosho, Hirasawa; Mamoru, Komachi; Toshinobu, Ogiso

WEKO3

インデックスツリー

RootNode

アイテム

事前学習モデルを用いた近代文語文のニューラル機械翻訳

https://doi.org/10.20729/00216233

名前 / ファイル	ライセンス	アクション
IPSJ-JNL6302003.pdf (909.6 kB)	Copyright (c) 2022 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2022-02-15

タイトル

事前学習モデルを用いた近代文語文のニューラル機械翻訳

タイトル

言語

タイトル

Neural Machine Translation of Classical Japanese Texts in the Late 19th Century Using Pretrained Language Models

言語

jpn

キーワード

主題Scheme

Other

主題

[特集:人文科学とコンピュータ] 近代文語文，機械翻訳，事前学習モデル

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

ID登録

10.20729/00216233

ID登録タイプ

JaLC

著者所属

東京都立大学

著者所属

東京都立大学

著者所属

東京都立大学

著者所属

国立国語研究所

著者所属(英)

Tokyo Metropolitan University

著者所属(英)

Tokyo Metropolitan University

著者所属(英)

Tokyo Metropolitan University

著者所属(英)

National Institute for Japanese Language and Linguistics

著者名

喜友名, 朝視顕
平澤, 寅庄
小町, 守
小木曽, 智信

著者名(英)

Tomoshige, Kiyuna
Tosho, Hirasawa
Mamoru, Komachi
Toshinobu, Ogiso

論文抄録

内容記述タイプ

Other

内容記述

明治・大正期に広く用いられた近代文語文は，現代の日本語話者にとって，専門知識がないと読むことが難しい．近代文語文の特徴として，現代の書き言葉と共通する単語は多いが，共通するnの大きなnグラムはほとんどないことがあげられる．本研究では，『学問のすゝめ』（1872-1876）と『人世三宝説』（1875）の翻訳に焦点を当てる．ニューラル翻訳モデルの学習に使用できる対訳コーパスが少ないという問題に対応するため，事前学習モデルを用いる．実験の結果，対訳コーパスを用いず単言語コーパスのみを用いることで，原文との類似度が高い出力が得られた．加えて，既存の自動評価指標とその変種がどの程度ポストエディットのコストを考慮できているかを調査した．

論文抄録(英)

内容記述タイプ

Other

内容記述

Classical Japanese texts in the late 19th century are difficult for contemporary Japanese to read without specialized knowledge. Specifically, this paper focuses on the translation of “An Encouragement of Learning” (1872-1876) and “Three Treasures in Human Life,” (1875) which have many identical unigrams between the source and reference sentences but no significant overlapping larger n-grams. We approach this task by using pretrained language models to address the associated data acquisition bottleneck. The results show that the use of an unsupervised method without fine-tuning on parallel data provides translation outputs with a high degree of similarity to the source text. In addition, we investigate the extent to which existing automatic evaluation metrics and their variants are able to account for post-editing cost.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 63, 号 2, p. 269-282, 発行日 2022-02-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 15:48:47.653166

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

事前学習モデルを用いた近代文語文のニューラル機械翻訳

× 喜友名, 朝視顕

× 平澤, 寅庄

× 小町, 守

× 小木曽, 智信

× Tomoshige, Kiyuna

× Tosho, Hirasawa

× Mamoru, Komachi

× Toshinobu, Ogiso

Versions

Share

Cite as

エクスポート