Item type |
Symposium(1) |
公開日 |
2019-10-14 |
タイトル |
|
|
タイトル |
Integrating Back-Translation into BERT Model for Detecting Machine-Translated Text |
タイトル |
|
|
言語 |
en |
|
タイトル |
Integrating Back-Translation into BERT Model for Detecting Machine-Translated Text |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
Machine-Translation Detection,Back-Translation,BERT Model,Paraphrasing,Adversarial Text |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_5794 |
|
資源タイプ |
conference paper |
著者所属 |
|
|
|
Indian Institute of Technology |
著者所属 |
|
|
|
KDDI Research Inc. |
著者所属 |
|
|
|
The University of Tokyo |
著者所属 |
|
|
|
KDDI Research Inc. |
著者所属 |
|
|
|
KDDI Research Inc. |
著者所属(英) |
|
|
|
en |
|
|
Indian Institute of Technology |
著者所属(英) |
|
|
|
en |
|
|
KDDI Research Inc. |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
KDDI Research Inc. |
著者所属(英) |
|
|
|
en |
|
|
KDDI Research Inc. |
著者名 |
Gupta, Ishita
Hoang-Quoc, Nguyen-Son
Thao, Tran Phuong
Seira, Hidano
Shinsaku, Kiyomoto
|
著者名(英) |
Gupta, Ishita
Hoang-Quoc, Nguyen-Son
Thao, Tran Phuong
Seira, Hidano
Shinsaku, Kiyomoto
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Machine-generated text is being used by adversaries to support malicious purposes like spam mails and fake reviews etc. Recently a form of 'back translation' plagiarism has started where texts are paraphrased by translating into a different language and then back into the original language. Previous methods for detecting such machine-generated text are based only on the intrinsic content of the text. We propose a detector which exploits the external information obtained from back-translation; and integrates it into the BERT model. An evaluation of 90000 samples of original English sentences and translated French sentences shows that our detector can classify them with 83.8% accuracy. This is higher than previous methods whose best accuracy is 79.9%. Moreover, our detector can efficiently detect back-translated text with 87.1% accuracy when assessed on 20000 sentences. This is an improvement from 82.2% accuracy of the state of the art. We have also conducted experiments with low-resource language and reached similar results. This demonstrates the persistence of our detector on various tasks in both rich- and low-resource languages. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Machine-generated text is being used by adversaries to support malicious purposes like spam mails and fake reviews etc. Recently a form of 'back translation' plagiarism has started where texts are paraphrased by translating into a different language and then back into the original language. Previous methods for detecting such machine-generated text are based only on the intrinsic content of the text. We propose a detector which exploits the external information obtained from back-translation; and integrates it into the BERT model. An evaluation of 90000 samples of original English sentences and translated French sentences shows that our detector can classify them with 83.8% accuracy. This is higher than previous methods whose best accuracy is 79.9%. Moreover, our detector can efficiently detect back-translated text with 87.1% accuracy when assessed on 20000 sentences. This is an improvement from 82.2% accuracy of the state of the art. We have also conducted experiments with low-resource language and reached similar results. This demonstrates the persistence of our detector on various tasks in both rich- and low-resource languages. |
書誌レコードID |
|
|
|
識別子タイプ |
NCID |
|
|
関連識別子 |
ISSN 1882-0840 |
書誌情報 |
コンピュータセキュリティシンポジウム2019論文集
巻 2019,
p. 1349-1355,
発行日 2019-10-14
|
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |