非定常な多腕バンディット問題における変化検出アプローチの線形モデルへの拡張

三宅, 悠介; 栗林, 健太郎

WEKO3

インデックスツリー

RootNode

アイテム

非定常な多腕バンディット問題における変化検出アプローチの線形モデルへの拡張

https://ipsj.ixsq.nii.ac.jp/records/206129

名前 / ファイル	ライセンス	アクション
IPSJ-IOT20050002.pdf (602.7 kB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2020-07-03

タイトル

非定常な多腕バンディット問題における変化検出アプローチの線形モデルへの拡張

タイトル

言語

タイトル

Extension of Change Detection Method for Non-Stationary Linear Multi Armed Bandit

言語

jpn

キーワード

主題Scheme

Other

主題

RDM・機械学習

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

GMOペパボ株式会社ペパボ研究所

著者所属

GMOペパボ株式会社ペパボ研究所

著者所属(英)

Pepabo R&D Institute, GMO Pepabo, Inc.

著者所属(英)

Pepabo R&D Institute, GMO Pepabo, Inc.

著者名

三宅, 悠介
栗林, 健太郎

論文抄録

内容記述タイプ

Other

内容記述

多腕バンディット問題は，腕と呼ばれる複数の候補から得られる報酬を最大化する問題である．同問題の Web サービスにおける広告配信や推薦システムへの応用では，腕となる利用者の嗜好傾向が多様かつ継続的に変化する課題に対処するため，利用者の文脈を考慮した問題設定への拡張と解法が提案されている．時間の経過に従い報酬分布が変化する非定常な問題設定の解法では，変化検出の手法を組み合わせ，報酬の変化を観察することで変化に追従する．しかしながら，腕の文脈を複数の要因のパラメータの組み合わせで表現し，文脈に応じて報酬分布が決定する線形な問題設定にこの解法を適用する場合，要因の数に対して指数的に増加する全ての報酬の変化を観察しなければならない．本報告では，要因の組み合わせ数によらない単一の値の推移のみから報酬分布の変化を検出・追従することで，従来の線形な解法を利用可能でありながら，汎用的でメモリ効率に優れた非定常かつ線形な多腕バンディット問題の解法を提案する．提案手法では，各腕に対する試行回数と報酬から要因のパラメータに対する試行時点での係数を推定し，この値の和の推移から報酬分布の変化を検出する．また，報酬分布の変化に合わせた動的なハイパーパラメータ調整により迅速に変化に追従する．評価では，非定常かつ線形な多腕バンディット問題を設定し，変化検出を行わない場合と比較して性能が上回ることを確認した．

論文抄録(英)

内容記述タイプ

Other

内容記述

A multi-armed bandit problem is a problem that maximizes reward from making choices between candidates called arms. For the application of advertisement or recommendation system of this problem, the contextual extension of problem setting and policy is proposed to deal with variously and continuously changing user preferences. For non-stationary problems, policies of making choices follow changes by observing reward change using change detection methods. On the other hand, for a linear problem that changes reward distribution according to factors, the decision-maker of the policies must observe all reward changes that increase exponentially. In this report, we propose a non-stationary linear multi-armed bandit policy to detect a change of reward distribution from a single value transition that is independent of the number of reward changes. The proposed policy estimates coefficients of the factors from the number of trials and rewards and detects a change of reward distribution from the sum of the estimated values. Also, the policy can follow change rapidly using dynamic hyper-parameter turning. We set up a non-stationary and linear multi-armed bandit problem for the evaluation and confirmed our policy makes the performance increase.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12326962

書誌情報

研究報告インターネットと運用技術（IOT）

巻 2020-IOT-50, 号 2, p. 1-8, 発行日 2020-07-03

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8787

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 19:31:35.575156

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

非定常な多腕バンディット問題における変化検出アプローチの線形モデルへの拡張

× 三宅, 悠介

× 栗林, 健太郎

Versions

Share

Cite as

エクスポート