WEKO3
アイテム
Learning Transfer Rules from Annotated English - Vietnamese Bilingual Corpus
https://ipsj.ixsq.nii.ac.jp/records/50267
https://ipsj.ixsq.nii.ac.jp/records/50267d985ef5a-7d16-4f4f-a957-33ece2c65b9b
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2004 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2004-12-04 | |||||||
タイトル | ||||||||
タイトル | Learning Transfer Rules from Annotated English - Vietnamese Bilingual Corpus | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | Learning Transfer Rules from Annotated English - Vietnamese Bilingual Corpus | |||||||
言語 | ||||||||
言語 | eng | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
Faculty of Information Technology University of Natural Sciences VNU - HCMC | ||||||||
著者所属 | ||||||||
Center of Information Technology Development Vietnam National University of HCMC | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Faculty of Information Technology, University of Natural Sciences, VNU - HCMC | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Center of Information Technology Development, Vietnam National University of HCMC | ||||||||
著者名 |
Dinh, Dien
× Dinh, Dien
|
|||||||
著者名(英) |
Dinh, Dien
× Dinh, Dien
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | Due to the difference of language typology many transfer rules are required in the lexical and structural transfer stage in the English-to-Vietnamese Machine Translation. Recently many NLP (Natural Language Processing) tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English French etc. have been well studied with satisfactory achievements. In contrast corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper we present issues of collection and annotation (Word Alignment. Word Segmentation Vietnamese and Part-of-Speech) of a parallel corpus of English-Vietnamese named EVC (English-Vietnamese Corpus). From this EVC transfer rules have been automatically mined to train for Vietnamese-related NLP tasks and to study English - Vietnamese comparative linguistics. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | Due to the difference of language typology, many transfer rules are required in the lexical and structural transfer stage in the English-to-Vietnamese Machine Translation. Recently, many NLP (Natural Language Processing) tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we present issues of collection and annotation (Word Alignment. Word Segmentation Vietnamese and Part-of-Speech) of a parallel corpus of English-Vietnamese named EVC (English-Vietnamese Corpus). From this EVC transfer rules have been automatically mined to train for Vietnamese-related NLP tasks and to study English - Vietnamese comparative linguistics. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AA11135936 | |||||||
書誌情報 |
情報処理学会研究報告知能と複雑系(ICS) 巻 2004, 号 125(2004-ICS-138), p. 31-36, 発行日 2004-12-04 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |