文脈を考慮した文章ベクトルを用いた強化学習

村上,遼太郎; 太田,学; 上野,史; Ryotaro Murakami; Manabu Ohta; Fumito Uwano

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

文脈を考慮した文章ベクトルを用いた強化学習

https://ipsj.ixsq.nii.ac.jp/records/2006867

名前 / ファイル	ライセンス	アクション
IPSJ-TOD1901010.pdf (5.0 MB) 2028年1月26日からダウンロード可能です。	Copyright (c) 2026 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, DBS:会員：¥0, IFAT:会員：¥0, DLIB:会員：¥0

Item type

Trans(1)

公開日

2026-01-26

タイトル

言語

タイトル

文脈を考慮した文章ベクトルを用いた強化学習

タイトル

言語

タイトル

Reinforcement Learning with Context-aware Sentence Representations

言語

jpn

キーワード

主題Scheme

Other

主題

[研究論文（推薦論文）] 強化学習，自然言語処理，埋め込み処理，大規模言語モデル

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

岡山大学

著者所属

岡山大学

著者所属

岡山大学

著者所属(英)

Okayama University

著者所属(英)

Okayama University

著者所属(英)

Okayama University

著者名

村上,遼太郎
太田,学
上野,史

著者名(英)

Ryotaro Murakami
Manabu Ohta
Fumito Uwano

論文抄録

内容記述タイプ

Other

内容記述

近年，大規模言語モデル（LLM: Large Language Model）を強化学習に応用する研究が注目を集めている．特に，報酬や教師信号が得にくい疎な環境や，複雑なルールを必要とする環境において，LLMが報酬を生成する，または方策の選択を支援するといった補助的な活用法が提案されている．しかしながら，LLMの生成する言語情報を強化学習の環境に適切に組み込むことは依然として困難であり，その利用方法は限定的である．本研究では，強化学習における言語情報の活用を目的とし，その基盤技術として文脈を考慮したテキストデータの埋め込みモデルと，それに基づく強化学習手法を提案する．具体的には，テキストベースの強化学習フレームワークであるScriptWorldを用い，埋め込みモデルにより生成した文章ベクトルを深層強化学習のニューラルネットワークに入力し，入力文に対する正しい選択肢の出力則を訓練する．実験の結果，埋め込みモデルによって生成された文章ベクトルを個別にニューラルネットワークに入力する文脈独立型ベクトル生成モデルを適用した強化学習手法において，従来手法と比較して約45倍の学習効率の向上が確認された．また，LLMを用いた埋め込みモデルを導入することで，強化学習の状態観測に適したテキストデータの埋め込み表現の生成が可能となり，SBERTの派生モデルと比較して学習効率の改善が見られた．

論文抄録(英)

内容記述タイプ

Other

内容記述

In recent years, research on applying Large Language Models (LLMs) to reinforcement learning (RL) has attracted significant attention. Particularly in sparse environments where rewards or supervisory signals are difficult to obtain, and in environments requiring complex rules, auxiliary applications of LLMs have been proposed, such as generating rewards or assisting in policy selection. However, appropriately incorporating linguistic information generated by LLMs into RL environments remains challenging, and their utilization methods are still limited. This study aims to explore the use of linguistic information in RL and proposes a novel RL approach based on a context-aware text embedding model. Specifically, we use ScriptWorld, a text-based RL framework, in which sentence vectors generated by embedding models are input into a deep RL network to train the agent to select appropriate actions based on input sentences. Experimental results demonstrated that the RL method employing a context-independent model where sentence vectors generated by embedding models are individually input into a neural network achieved approximately 45 times higher learning efficiency compared to a conventional method. Furthermore, the introduction of embedding models based on LLMs enabled the generation of text embedding representations well-suited for state observation in RL, leading to improved learning efficiency compared to SBERT-derived models.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464847

書誌情報

情報処理学会論文誌データベース（TOD）

巻 19, 号 1, p. 93-105, 発行日 2026-01-26

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7799

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2026-01-21 05:58:12.978007

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

文脈を考慮した文章ベクトルを用いた強化学習

× 村上,遼太郎

× 太田,学

× 上野,史

× Ryotaro Murakami

× Manabu Ohta

× Fumito Uwano

Versions

Share

Cite as

エクスポート