情報学広場：情報処理学会電子図書館

WEKO3

To

lat lon distance

[[sub_check.contents]]

[[sub_check.contents]]

[[sub_radio.contents]]

To

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

検索拡張生成（RAG）は大規模言語モデルからの学習データ漏洩リスクを軽減するのか？

https://ipsj.ixsq.nii.ac.jp/records/240788

名前 / ファイル	ライセンス	アクション
IPSJ-CSS2024042.pdf (935.3 kB) 2026年10月15日からダウンロード可能です。	Copyright (c) 2024 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, CSEC:会員：¥0, SPT:会員：¥0, DLIB:会員：¥0

Item type

Symposium(1)

公開日

2024-10-15

タイトル

言語

ja

タイトル

検索拡張生成（RAG）は大規模言語モデルからの学習データ漏洩リスクを軽減するのか？

タイトル

言語

en

タイトル

Does Retrieval-Augmented Generation Mitigate Training Data Leakage Risks from Large Language Models?

言語

言語

jpn

キーワード

主題Scheme

Other

主題

検索拡張生成，メンバーシップ推論攻撃，大規模言語モデル，ファインチューニング，LoRA

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

三菱電機株式会社

著者所属

三菱電機株式会社

著者所属

三菱電機株式会社

著者所属(英)

en

Mitsubishi Electric Corporation

著者所属(英)

en

Mitsubishi Electric Corporation

著者所属(英)

en

Mitsubishi Electric Corporation

著者名

中井, 綱人
東, 拓矢
大西, 健斗

著者名(英)

Tsunato, Nakai
Takuya, Higashi
Kento, Oonishi

論文抄録

内容記述タイプ

Other

内容記述

検索拡張生成(RAG)は，外部知識データベース(リトリーバルデータベース)から関連知識を取得することで，大規模言語モデルの学習効率，知識更新，信頼性を強化する技術である．RAGの有用性が注目される一方で，RAGシステムに対するセキュリティ・プライバシーリスクが指摘されはじめている．先行研究では，RAG特有であるリトリーバルデータベースのデータに関する情報漏洩リスクが明らかにされたが，大規模言語モデルの学習データに関する情報漏洩リスクは反対にRAGにより軽減するとの報告があった．これまで様々な機械学習システムに対して，メンバーシップ推論攻撃による学習モデルからの情報漏洩リスクの評価が行われてきたが，RAGシステムに対するメンバーシップ推論攻撃評価はまだ十分に行われていない．特に，RAGシステムに対する大規模言語モデルの学習データに関する情報漏洩リスクに焦点を当てたメンバーシップ推論攻撃は実施されておらず，RAGによるそのリスクの軽減効果も検証されていない．そこで本稿では，RAGによる大規模言語モデルの学習データに関する情報漏洩リスクの軽減効果をメンバーシップ推論攻撃で検証した結果を報告する．70億パラメータ規模の3つの大規模言語モデルと2つのデータセットを用いたRAGシステムの実験により，メンバーシップ推論攻撃評価では先行研究の報告と異なり，大規模言語モデルの学習データ漏洩リスクはRAGにより軽減されないことを明らかにした．

論文抄録(英)

内容記述タイプ

Other

内容記述

Retrieval-augmented Generation (RAG) is a technique that enhances the training efficiency, knowledge updating, and reliability of large language models (LLMs) by retrieving relevant knowledge from an external knowledge database (retrieval database). While the usefulness of RAG is gaining attention, security and privacy risks associated with RAG systems are beginning to be pointed out. Previous studies have revealed the risk of information leakage related to the data within retrieval databases, which is unique to RAG systems, but it has been reported that RAG may mitigate the risk of information leakage related to the training data of LLMs. Although various machine learning systems have been evaluated for the risk of information leakage from trained models through membership inference attacks (MIAs), the evaluation of MIAs on RAG systems has not been sufficiently conducted. In particular, no studies have yet conducted MIAs focusing on the risk of information leakage related to the training data of LLMs on RAG systems, and the potential risk mitigation effects of RAG has not been evaluated. This paper reports the results of evaluating the risk mitigation effect of RAG on information leakage related to the training data of LLMs through MIAs. Experimental evaluation of RAG systems using three LLMs with 7 billion parameters and two datasets revealed that, contrary to previous studies, RAG does not mitigate the risk of training data leakage from LLMs in MIAs.

書誌情報

コンピュータセキュリティシンポジウム2024論文集

p. 303-310, 発行日 2024-10-15

出版者

言語

ja

出版者

情報処理学会

戻る

0

views

	Views

Versions

Ver.1

2025-01-19 07:51:43.513675

Show All versions

Share

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX