ログイン 新規登録
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 論文誌(トランザクション)
  2. データベース(TOD)[電子情報通信学会データ工学研究専門委員会共同編集]
  3. Vol.17
  4. No.2

Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks

https://ipsj.ixsq.nii.ac.jp/records/233828
https://ipsj.ixsq.nii.ac.jp/records/233828
e42acb3d-d3ff-4aa9-8b8c-7ce886465c70
名前 / ファイル ライセンス アクション
IPSJ-TOD1702007.pdf IPSJ-TOD1702007.pdf (793.0 kB)
 2026年4月23日からダウンロード可能です。
Copyright (c) 2024 by the Information Processing Society of Japan
非会員:¥0, IPSJ:学会員:¥0, DBS:会員:¥0, IFAT:会員:¥0, DLIB:会員:¥0
Item type Trans(1)
公開日 2024-04-23
タイトル
タイトル Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks
タイトル
言語 en
タイトル Automatic Stopword Generation Based on Attention for Document Classification Using Neural Networks
言語
言語 eng
キーワード
主題Scheme Other
主題 [研究論文] stopwords, attention, BERT, neural network, text classification, machine learning, natural language processing
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_6501
資源タイプ journal article
著者所属
Gifu University
著者所属
Gifu University
著者所属(英)
en
Gifu University
著者所属(英)
en
Gifu University
著者名 Yuki, Kuwabara

× Yuki, Kuwabara

Yuki, Kuwabara

Search repository
Yu, Suzuki

× Yu, Suzuki

Yu, Suzuki

Search repository
著者名(英) Yuki, Kuwabara

× Yuki, Kuwabara

en Yuki, Kuwabara

Search repository
Yu, Suzuki

× Yu, Suzuki

en Yu, Suzuki

Search repository
論文抄録
内容記述タイプ Other
内容記述 Stopwords are generally used to improve the accuracy of document classification and retrieval. We believe that setting appropriate stopwords improves classification accuracy. However, in our preliminary experiments, in document classification tasks using BERT, existing stopword lists are not effective for improving classification accuracy. To solve this problem, we construct a method for generating stopwords using the attention mechanism of the classifiers. In this method, words with high attention in misclassified input documents and low attention in correctly classified documents are treated as stopwords. The system probabilistically removes stopwords. The system automatically sets the probability of each word in input documents being a stopword when it builds the classification model. We conduct experiments to confirm effectiveness of our stopword generation method. Our experimental results show that there are cases using stopwords generated by our method that improve the classification accuracy. Three of the six classification tasks tested in this study show significant differences in accuracy improvement.
------------------------------
This is a preprint of an article intended for publication Journal of
Information Processing(JIP). This preprint should not be cited. This
article should be cited as: Journal of Information Processing Vol.32(2024) (online)
------------------------------
論文抄録(英)
内容記述タイプ Other
内容記述 Stopwords are generally used to improve the accuracy of document classification and retrieval. We believe that setting appropriate stopwords improves classification accuracy. However, in our preliminary experiments, in document classification tasks using BERT, existing stopword lists are not effective for improving classification accuracy. To solve this problem, we construct a method for generating stopwords using the attention mechanism of the classifiers. In this method, words with high attention in misclassified input documents and low attention in correctly classified documents are treated as stopwords. The system probabilistically removes stopwords. The system automatically sets the probability of each word in input documents being a stopword when it builds the classification model. We conduct experiments to confirm effectiveness of our stopword generation method. Our experimental results show that there are cases using stopwords generated by our method that improve the classification accuracy. Three of the six classification tasks tested in this study show significant differences in accuracy improvement.
------------------------------
This is a preprint of an article intended for publication Journal of
Information Processing(JIP). This preprint should not be cited. This
article should be cited as: Journal of Information Processing Vol.32(2024) (online)
------------------------------
書誌レコードID
収録物識別子タイプ NCID
収録物識別子 AA11464847
書誌情報 情報処理学会論文誌データベース(TOD)

巻 17, 号 2, 発行日 2024-04-23
ISSN
収録物識別子タイプ ISSN
収録物識別子 1882-7799
出版者
言語 ja
出版者 情報処理学会
戻る
0
views
See details
Views

Versions

Ver.1 2025-01-19 09:58:11.651567
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3