特許文献によるBERT事前学習モデルと特許調査業務への応用

秋山, 賢二; 斎藤, 隆文; Kenji, Akiyama; Takafumi, Saito

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

特許文献によるBERT事前学習モデルと特許調査業務への応用

https://ipsj.ixsq.nii.ac.jp/records/226928

名前 / ファイル	ライセンス	アクション
IPSJ-TDP0403009.pdf (2.1 MB)	Copyright (c) 2023 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2023-07-15

タイトル

特許文献によるBERT事前学習モデルと特許調査業務への応用

タイトル

言語

タイトル

BERT Pre-trained Model with Patent Documents and Application for Patent Survey

言語

jpn

キーワード

主題Scheme

Other

主題

[一般投稿論文] 特許調査, 自然言語処理, BERT, 文書ランキング

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

東京農工大学

著者所属

東京農工大学

著者所属(英)

Tokyo University of Agriculture and Technology

著者所属(英)

Tokyo University of Agriculture and Technology

著者名

秋山, 賢二
斎藤, 隆文

著者名(英)

Kenji, Akiyama
Takafumi, Saito

論文抄録

内容記述タイプ

Other

内容記述

数多くの文献から目的にあった文献を効率よく仕分けすることは，様々な分野で求められている．近年の特許検索データベースは，クエリ文書との類似性により検索された特許文書をランキング表示することで，仕分けをサポートする機能を提供しているケースもある．しかし，特許の侵害回避調査では，調査対象製品との関係性で特許文書を仕分けする必要がある．製品に関する知識のほとんどは開発者の頭の中にあるため，仕分け作業はもっぱら人手に頼っていた．本研究では，あらかじめ指定した検索条件で文献を定期的に収集してチェックするSDI調査における過去の結果データを訓練データとして使うことで，製品との関連性で特許文書を機械学習で仕分けすることを提案する．また，機械学習の言語処理モデルとしては，2018年にGoogleから発表されたBERTが各種の言語処理タスクにおいて最も高い性能を達成しているので有力である．現状の日本語BERTモデルは日本語Wikipediaを使って事前学習した大型のモデルで，多くの計算機資源を必要とするため企業の一般的なPCでは利用が難しい．そこで，日本語版の特許専用モデルを作成して，特許に関するタスクでは，より小型のBERTモデルでも現行の大型一般モデルと同等の性能を発揮することを確認したので，その結果を報告する．

論文抄録(英)

内容記述タイプ

Other

内容記述

Efficient selection of suitable documents from large number of documents is required in various fields. Some recent patent search databases provide a function of ranking and displaying the retrieved patent documents by the similarity with the query texts to support the selecting works. However, in the patent infringement avoidance survey, it is necessary to select the patent documents according to the relationship with a product under survey. The knowledge of products is mostly in the memory of the person in charge of development. Therefore, the selecting works relied exclusively on human. In this study, we propose to use the past result data of the SDI survey as training data for machine learning, and to sorts the patent documents according to the relationship with the product. (SDI is to collect and check the document regularly under the pre-specified search conditions.) BERT is a promising language processing model for machine learning, which is announced by Google in 2018 and has achieved the highest performance in various language processing tasks. The current Japanese BERT models are pre-trained using Japanese Wikipedia and have large size of parameters, which requires too much computer resources to execute on a general PC in a company. Therefore, we created a Japanese version of the patent-specific BERT model and confirmed that even a smaller parameter size of BERT model exhibits the same performance as a current Japanese BERT model in the patent ranking task. We report the effectiveness of the proposal and the BERT performances.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12894091

書誌情報

情報処理学会論文誌デジタルプラクティス（DP）

巻 4, 号 3, p. 58-68, 発行日 2023-07-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2435-6484

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 12:20:10.800132

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

特許文献によるBERT事前学習モデルと特許調査業務への応用

× 秋山, 賢二

× 斎藤, 隆文

× Kenji, Akiyama

× Takafumi, Saito

Versions

Share

Cite as

エクスポート