Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

Yuki, Kuwabara; Yu, Suzuki; Yuki, Kuwabara; Yu, Suzuki

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

https://ipsj.ixsq.nii.ac.jp/records/233828

名前 / ファイル	ライセンス	アクション
IPSJ-TOD1702007.pdf (793.0 kB)	Copyright (c) 2024 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2024-04-23

タイトル

Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

タイトル

言語

タイトル

Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

言語

eng

キーワード

主題Scheme

Other

主題

[研究論文] stopwords, attention, BERT, neural network, text classification, machine learning, natural language processing

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

Gifu University

著者所属

Gifu University

著者所属(英)

Gifu University

著者所属(英)

Gifu University

著者名

Yuki, Kuwabara
Yu, Suzuki

著者名(英)

Yuki, Kuwabara
Yu, Suzuki

論文抄録

内容記述タイプ

Other

内容記述

Stopwords are generally used to improve the accuracy of document classification and retrieval. We believe that setting appropriate stopwords improves classification accuracy. However, in our preliminary experiments, in document classification tasks using BERT, existing stopword lists are not effective for improving classification accuracy. To solve this problem, we construct a method for generating stopwords using the attention mechanism of the classifiers. In this method, words with high attention in misclassified input documents and low attention in correctly classified documents are treated as stopwords. The system probabilistically removes stopwords. The system automatically sets the probability of each word in input documents being a stopword when it builds the classification model. We conduct experiments to confirm effectiveness of our stopword generation method. Our experimental results show that there are cases using stopwords generated by our method that improve the classification accuracy. Three of the six classification tasks tested in this study show significant differences in accuracy improvement.
------------------------------
This is a preprint of an article intended for publication Journal of
Information Processing(JIP). This preprint should not be cited. This
article should be cited as: Journal of Information Processing Vol.32(2024) (online)
------------------------------

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464847

書誌情報

情報処理学会論文誌データベース（TOD）

巻 17, 号 2, 発行日 2024-04-23

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7799

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 09:58:11.651567

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

× Yuki, Kuwabara

× Yu, Suzuki

× Yuki, Kuwabara

× Yu, Suzuki

Versions

Share

Cite as

エクスポート