Mambaブロックが帰納ヘッドタスクを実行するメカニズム

山本, 悠士; 松崎, 拓也

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Mambaブロックが帰納ヘッドタスクを実行するメカニズム

https://ipsj.ixsq.nii.ac.jp/records/235092

名前 / ファイル	ライセンス	アクション
IPSJ-NL24260001.pdf (1.3 MB) 2026年6月21日からダウンロード可能です。	Copyright (c) 2024 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, NL:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2024-06-21

タイトル

Mambaブロックが帰納ヘッドタスクを実行するメカニズム

タイトル

言語

タイトル

The Mechanism by which the Mamba Block Performs the Inductive Head Task

言語

jpn

キーワード

主題Scheme

Other

主題

LLM基礎

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京理科大学

著者所属

東京理科大学

著者所属(英)

Tokyo University of Science

著者所属(英)

Tokyo University of Science

著者名

山本, 悠士
松崎, 拓也

論文抄録

内容記述タイプ

Other

内容記述

現在，Transformer は自然言語処理分野において最も人気のアーキテクチャであるが，その主要モジュールである Self-Attention の計算量は系列長に対して二次関数的に増加するため，長系列の推論に大きな計算コストを要する．一方，状態空間モデルに基づく言語モデルである Mamba は，計算量を線形に抑えつつ，Trasnformer に匹敵する性能を発揮したと報告されている．このため，長系列を扱うタスクにおいて，Transformer に代わる新しい選択肢として注目されている．本研究では，Mamba の主要モジュールである Selective SSM が帰納ヘッドタスクを実行する際の内部状態を分析した．ここで，帰納ヘッドタスクとは，トークンを生成する際に入力系列中の似た文脈から情報を取得する，いわゆる Few-shot のような状況を抽象化した人工タスクである．分析の結果，Selective SSM は入力系列中の bi-gram の表現を内部状態に記憶していて，同じ bi-gram の前半が再び出現した際に，記憶された bi-gram の後半の表現を出力していることが分かった．

論文抄録(英)

内容記述タイプ

Other

内容記述

Transformer is currently the most popular architecture in the field of natural language processing, but the space complexity of its main module, i.e. Self-Attention, increases quadratically with the length of the sequence, making the inference for long sequences computationally expensive. On the other hand, Mamba, a state-space model-based language model, achieved a performance comparable to Transformer while keeping the computational complexity linear. Therefore, Mamba is expected to be an alternative to Transformer for long-sequence tasks. In this study, we analyzed the mechanism by which Selective SSM, the main module of Mamba, performs induction head tasks. The induction head task is a synthetic task that abstracts the so-called few-shot scenario, in which information is retrieved from similar contexts in the input sequence when generating tokens. We found that the Selective SSM stores the representations of the bi-grams in the input sequence in its hidden state and outputs the stored representation of the second half of a bi-gram when the first half of the same bi-gram appears again.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10115061

書誌情報

研究報告自然言語処理（NL）

巻 2024-NL-260, 号 1, p. 1-9, 発行日 2024-06-21

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8779

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 09:36:50.881537

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Mambaブロックが帰納ヘッドタスクを実行するメカニズム

× 山本, 悠士

× 松崎, 拓也

Versions

Share

Cite as

エクスポート