WEKO3
アイテム
A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
https://ipsj.ixsq.nii.ac.jp/records/87009
https://ipsj.ixsq.nii.ac.jp/records/870096525f840-4a8f-42f5-b083-43af245e310d
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2012 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2012-11-15 | |||||||
タイトル | ||||||||
タイトル | A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge | |||||||
言語 | ||||||||
言語 | eng | |||||||
キーワード | ||||||||
主題Scheme | Other | |||||||
主題 | 機械翻訳・生成 | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
Nara Institute of Science and Technology | ||||||||
著者所属 | ||||||||
Nara Institute of Science and Technology | ||||||||
著者所属 | ||||||||
Nara Institute of Science and Technology | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Nara Institute of Science and Technology | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Nara Institute of Science and Technology | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Nara Institute of Science and Technology | ||||||||
著者名 |
Xiaodong, Liu
× Xiaodong, Liu
|
|||||||
著者名(英) |
Xiaodong, Liu
× Xiaodong, Liu
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | We propose a flexible and effective framework for extracting bilingual dictionaries from comparable corpora without using any language-specific knowledge such as seeds or additional dictionaries. Our approach is based on a novel combination of topic modeling and word alignment techniques in a pipeline style: first, our approach converts a comparable document-aligned corpus into a parallel topic-aligned corpus using topic modeling techniques, then learns translation relationships between words using word alignment models such as IBM model I. Compared with previous work, our framework is advantageous in that it only uses the statistical information without requiring any languagespecific knowledge for initialization. Furthermore, our framework is capable of handling polysemy: for example, it can extract distinct translations for the word ”Apple” as a fruit or as a company. Experiments on a large-scale Wikipedia corpus, show that our framework reliably extracts high-precision word pairs on a wide variety of comparable data conditions. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | We propose a flexible and effective framework for extracting bilingual dictionaries from comparable corpora without using any language-specific knowledge such as seeds or additional dictionaries. Our approach is based on a novel combination of topic modeling and word alignment techniques in a pipeline style: first, our approach converts a comparable document-aligned corpus into a parallel topic-aligned corpus using topic modeling techniques, then learns translation relationships between words using word alignment models such as IBM model I. Compared with previous work, our framework is advantageous in that it only uses the statistical information without requiring any languagespecific knowledge for initialization. Furthermore, our framework is capable of handling polysemy: for example, it can extract distinct translations for the word ”Apple” as a fruit or as a company. Experiments on a large-scale Wikipedia corpus, show that our framework reliably extracts high-precision word pairs on a wide variety of comparable data conditions. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AN10115061 | |||||||
書誌情報 |
研究報告自然言語処理(NL) 巻 2012-NL-209, 号 15, p. 1-6, 発行日 2012-11-15 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |