Multi-Head-Attentionによるフェイクニュースに共通する特徴の抽出

石丸, 貴之; 三村, 守; Takayuki, Ishimaru; Mamoru, Mimura

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Multi-Head-Attentionによるフェイクニュースに共通する特徴の抽出

https://ipsj.ixsq.nii.ac.jp/records/223071

名前 / ファイル	ライセンス	アクション
IPSJ-CSS2022016.pdf (2.3 MB)	Copyright (c) 2022 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2022-10-17

タイトル

Multi-Head-Attentionによるフェイクニュースに共通する特徴の抽出

タイトル

言語

タイトル

Extracting common features of fake news by Multi-Head-Attention

言語

jpn

キーワード

主題Scheme

Other

主題

フェイクニュース，BERT，自然言語処理

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

防衛大学校情報工学科

著者所属

防衛大学校情報工学科

著者所属(英)

National Defense Academy of Japan

著者所属(英)

National Defense Academy of Japan

著者名

石丸, 貴之
三村, 守

著者名(英)

Takayuki, Ishimaru
Mamoru, Mimura

論文抄録

内容記述タイプ

Other

内容記述

社会問題となっているフェイクニュースに対して，機械学習を用いた検出方法が提案されている．しかしながら，これらの手法では単一のデータセットを用いて精度を評価していることが多く，様々な分野に対応できる汎用性のあるモデルの提案は少ない．本研究では，特徴の異なる 3 つのデータセットを用いて，フェイクニュース検出器の汎用性を検証するとともに，データセットに共通する特徴に着目した．データセットは，ラベル別にリアルニュース 27442 件，フェイクニュース 28359 件で構成した．特徴の抽出は，BERT (Bidirectional Encoder Representations from Transformers) の Multi-Head-Attention で重みを数値化し，重みの大きい単語に注目して実施した．各データセットの上位の単語を相互に比較すると，全体の 13％にあたる 14 語のみが共通していた．さらに，汎用性を評価するために，あるデータセットで学習済みのモデルを用い，他のデータセットを分類した．その結果，Accuracy は 99％から 50％以下に低下した．これらの結果から，フェイクニュースに共通する特徴は少なく，分類モデルの汎用性については改善の余地があることが判明した．

論文抄録(英)

内容記述タイプ

Other

内容記述

Several methods for detecting fake news using machine learning have been proposed. Previous studies have only focused on a limited dataset, and few researchers have proposed versatile models that can be applied to various fields. In this study, we focus on common features of multiple datasets. The three datasets consisted of 27442 real news and 28359 fake news. Feature extraction was conducted by focusing on frequent words based on attention weight in a BERT (Bidirectional Encoder Representations from Transformers) model. Comparing the top words in each dataset to each other, only 14 words (13 percent) of the total are common. To evaluate the generality, each dataset was classified using models trained on the other dataset. As a result, accuracy is reduced to less half of its original one. A few common features were revealed in multiple datasets. Therefore, there is room for improvement regarding the generality of the classification model.

書誌情報

コンピュータセキュリティシンポジウム2022論文集

p. 97-104, 発行日 2022-10-17

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 13:31:41.930140

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Multi-Head-Attentionによるフェイクニュースに共通する特徴の抽出

× 石丸, 貴之

× 三村, 守

× Takayuki, Ishimaru

× Mamoru, Mimura

Versions

Share

Cite as

エクスポート