音トークンのクロスドメイン変動分析：音声・音楽・環境音間の比較

芦原,孝典; デルクロア,マーク; 落合,翼; 松浦,孝平; 堀口,翔太

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

音トークンのクロスドメイン変動分析：音声・音楽・環境音間の比較

https://ipsj.ixsq.nii.ac.jp/records/2007606

名前 / ファイル	ライセンス	アクション
IPSJ-SLP26159037.pdf (1.6 MB) 2028年2月24日からダウンロード可能です。	Copyright (c) 2026 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2026-02-24

タイトル

言語

タイトル

音トークンのクロスドメイン変動分析：音声・音楽・環境音間の比較

タイトル

言語

タイトル

Cross-Domain Variation of Discrete Tokens: Comparative Analysis of Speech, Music, and Environmental Sounds

言語

jpn

キーワード

主題Scheme

Other

主題

SLP

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

NTT株式会社

著者所属

NTT株式会社

著者所属

NTT株式会社

著者所属

NTT株式会社

著者所属

NTT株式会社

著者所属(英)

NTT, Inc.

著者所属(英)

NTT, Inc.

著者所属(英)

NTT, Inc.

著者所属(英)

NTT, Inc.

著者所属(英)

NTT, Inc.

著者名

芦原,孝典
デルクロア,マーク
落合,翼
松浦,孝平
堀口,翔太

論文抄録

内容記述タイプ

Other

内容記述

ニューラルオーディオコーデックや自己教師あり学習モデルに基づく離散的な音表現(音トークン)は，その高い圧縮性に加え，テキストトークンと同じ離散空間上で音情報を取り扱うことが出来るため，近年注目を集めている．このような音トークンは，これまで音声・音楽・環境音といったドメインごとに個別に検討されてきたが，ドメイン横断的な特性はまだ十分に明らかになっていない．そこで本稿では，音トークンの振る舞いついて基礎的なクロスドメイン分析を行う．分析結果から，順位-頻度分布およびパープレキシティから推定される統計的特性および確率的な予測可能性は，ドメイン間で概ね一致していた．一方で，音トークンの使用分布はドメイン毎に異なっていた．このような知見は複数ドメインを一体的に処理可能な音声言語モデルの有効性を裏付けるとともに，ドメイン固有のトークン使用パターンをより適切に捉えることで更に性能改善し得ることを示唆している．

論文抄録(英)

内容記述タイプ

Other

内容記述

Techniques for discrete audio representation, which convert an audio signal into a sequence of audio tokens using neural audio codecs or self-supervised speech models, have gained attention for offering the possibility of modeling audio with large language models (LM) efficiently. While these audio tokens have been studied in various domains (e.g., speech, music, and general sound), their encoding properties across domains remain unclear. This paper examines several audio token types to analyze cross-domain variations. Our major findings include that audio tokens exhibit consistent statistical structures and probabilistic predictability deduced from rank-frequency distribution and perplexity, regardless of the domain. However, the token usage pattern is somewhat domain-dependent. This result underpins the steady success of the versatile audio LM, while also suggesting that domain-aware LM could further optimize performance by better capturing domain-specific token usage distributions.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2026-SLP-159, 号 37, p. 1-7, 発行日 2026-02-24

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2026-02-18 10:49:55.600854

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

音トークンのクロスドメイン変動分析：音声・音楽・環境音間の比較

× 芦原,孝典

× デルクロア,マーク

× 落合,翼

× 松浦,孝平

× 堀口,翔太

Versions

Share

Cite as

エクスポート