WEKO3
アイテム
Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model
https://ipsj.ixsq.nii.ac.jp/records/183817
https://ipsj.ixsq.nii.ac.jp/records/183817b836164d-bcc8-48f8-a853-1bcf3487d49f
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2017 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | Journal(1) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2017-10-15 | |||||||||||||||
タイトル | ||||||||||||||||
タイトル | Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model | |||||||||||||||
タイトル | ||||||||||||||||
言語 | en | |||||||||||||||
タイトル | Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model | |||||||||||||||
言語 | ||||||||||||||||
言語 | eng | |||||||||||||||
キーワード | ||||||||||||||||
主題Scheme | Other | |||||||||||||||
主題 | [一般論文] statistical machine translation, hierarchical phrase-based SMT, phrase alignments, synchronous context free grammar, Hiero grammar, non-parametric Bayesian statistics, unsupervised grammar induction | |||||||||||||||
資源タイプ | ||||||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||||||
資源タイプ | journal article | |||||||||||||||
著者所属 | ||||||||||||||||
NTT Communication Science Laboratories | ||||||||||||||||
著者所属 | ||||||||||||||||
Google Inc. | ||||||||||||||||
著者所属 | ||||||||||||||||
Tokyo Institute of Technology | ||||||||||||||||
著者所属 | ||||||||||||||||
Tokyo Institute of Technology | ||||||||||||||||
著者所属 | ||||||||||||||||
National Institute of Information and Communication Technology | ||||||||||||||||
著者所属(英) | ||||||||||||||||
en | ||||||||||||||||
NTT Communication Science Laboratories | ||||||||||||||||
著者所属(英) | ||||||||||||||||
en | ||||||||||||||||
Google Inc. | ||||||||||||||||
著者所属(英) | ||||||||||||||||
en | ||||||||||||||||
Tokyo Institute of Technology | ||||||||||||||||
著者所属(英) | ||||||||||||||||
en | ||||||||||||||||
Tokyo Institute of Technology | ||||||||||||||||
著者所属(英) | ||||||||||||||||
en | ||||||||||||||||
National Institute of Information and Communication Technology | ||||||||||||||||
著者名 |
Hidetaka, Kamigaito
× Hidetaka, Kamigaito
× Taro, Watanabe
× Hiroya, Takamura
× Manabu, Okumura
× Eiichiro, Sumita
|
|||||||||||||||
著者名(英) |
Hidetaka, Kamigaito
× Hidetaka, Kamigaito
× Taro, Watanabe
× Hiroya, Takamura
× Manabu, Okumura
× Eiichiro, Sumita
|
|||||||||||||||
論文抄録 | ||||||||||||||||
内容記述タイプ | Other | |||||||||||||||
内容記述 | In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can generate compact rules and phrase pairs without resorting to any heuristics, because longer rules and phrase pairs are automatically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al. combined with slice sampling. In our experiments, the proposed model achieved a higher or at least comparable translation quality against a previous Bayesian model on various language pairs: German/French/Spanish/Japanese-English. When compared against heuristic models, our model achieved comparable translation quality on a full size German-English language pair in Europarl v7 corpus with a significantly smaller grammar size; less than 10% of that for heuristic models. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.25(2017) (online) DOI http://dx.doi.org/10.2197/ipsjjip.25.912 ------------------------------ |
|||||||||||||||
論文抄録(英) | ||||||||||||||||
内容記述タイプ | Other | |||||||||||||||
内容記述 | In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can generate compact rules and phrase pairs without resorting to any heuristics, because longer rules and phrase pairs are automatically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al. combined with slice sampling. In our experiments, the proposed model achieved a higher or at least comparable translation quality against a previous Bayesian model on various language pairs: German/French/Spanish/Japanese-English. When compared against heuristic models, our model achieved comparable translation quality on a full size German-English language pair in Europarl v7 corpus with a significantly smaller grammar size; less than 10% of that for heuristic models. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.25(2017) (online) DOI http://dx.doi.org/10.2197/ipsjjip.25.912 ------------------------------ |
|||||||||||||||
書誌レコードID | ||||||||||||||||
収録物識別子タイプ | NCID | |||||||||||||||
収録物識別子 | AN00116647 | |||||||||||||||
書誌情報 |
情報処理学会論文誌 巻 58, 号 10, 発行日 2017-10-15 |
|||||||||||||||
ISSN | ||||||||||||||||
収録物識別子タイプ | ISSN | |||||||||||||||
収録物識別子 | 1882-7764 |