機械学習を用いた音声知覚ハッシュの大規模データセットによる評価

郭,承迅; 細野,海人; 宮田,高道; 宮田,純子; 木下,宏揚; Kuo Chenghsun; Kaito Hosono; Takamichi Miyata; Sumiko Miyata; Hirotsugu Kinoshita

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

機械学習を用いた音声知覚ハッシュの大規模データセットによる評価

https://ipsj.ixsq.nii.ac.jp/records/2007421

名前 / ファイル	ライセンス	アクション
IPSJ-IOT26072044.pdf (1.3 MB) 9999年1月1日からダウンロード可能です。	Copyright (c) 2026 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
IOT:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2026-02-24

タイトル

言語

タイトル

機械学習を用いた音声知覚ハッシュの大規模データセットによる評価

タイトル

言語

タイトル

Evaluation of Audio Perceptual Hashing Using Machine Learning on Large-scale Datasets

言語

jpn

キーワード

主題Scheme

Other

主題

SITE

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

神奈川大学工学部電気電子情報工学科

著者所属

神奈川大学工学部電気電子情報工学科

著者所属

千葉工業大学先進工学部知能メディア工学科

著者所属

東京科学大学工学院

著者所属

神奈川大学工学部電気電子情報工学科

著者所属(英)

Department of Electrical, Electronic and Information Engineering, Faculty of Engineering, Kanagawa University

著者所属(英)

Department of Electrical, Electronic and Information Engineering, Faculty of Engineering, Kanagawa University

著者所属(英)

Department of Advanced Media, Faculty of Advanced Engineering, Chiba Institute of Technology

著者所属(英)

School of Engineering, Institute of Science Tokyo

著者所属(英)

Department of Electrical, Electronic and Information Engineering, Faculty of Engineering, Kanagawa University

著者名

郭,承迅
細野,海人
宮田,高道
宮田,純子
木下,宏揚

著者名(英)

Kuo Chenghsun
Kaito Hosono
Takamichi Miyata
Sumiko Miyata
Hirotsugu Kinoshita

論文抄録

内容記述タイプ

Other

内容記述

近年，SNSや動画共有サービスの普及により，音声コンテンツの流通が急速に拡大している．これに伴い，二次利用やDeepFake音声などの問題が顕在化しており，著作権管理の重要性が増している．コンテンツの同一性を確認する技術としてハッシュ関数が広く用いられているが，従来の暗号学的ハッシュ関数はデータのごくわずかな変化でも出力が大幅に異なるため，圧縮やノイズ付加などの正当な信号処理が施されたコンテンツの同一性判定には不向きである．これに対し，人間の知覚特性に基づき，内容が同一であれば同一の値を，内容が異なれば異なる値を出力する「知覚ハッシュ」が注目されている．著者らはこれまでに，小規模なデータセット（Mini Speech Commands）を用いて，CNNおよびWav2Vec2モデルの重み係数を利用した音声知覚ハッシュ生成の基礎的検討を行ってきた．その結果，両手法の基本的な有効性が確認されたものの，より実用的な環境を想定した大規模データセットに対するスケーラビリティやロバスト性については未検証であった．そこで本研究では，学習データセットの規模が知覚ハッシュ生成モデルの性能に与える影響を明らかにすることを目的とする．具体的には，CNNおよびWav2Vec2モデルに対し，小規模データ（Mini）と全量データ（Full Speech Commands）を用いた場合の識別精度と汎化性能の変化を定量的に評価する．実験の結果，データ規模の拡大に対してWav2Vec2モデルが極めて高い堅牢性と適応能力を示すことを明らかにした．

論文抄録(英)

内容記述タイプ

Other

内容記述

With the rapid spread of SNS and video sharing services, unauthorized secondary use and DeepFake audio have emerged as significant issues for copyright management. While hash functions are commonly used to verify content integrity, conventional cryptographic hash functions are unsuitable for content that has undergone legitimate signal processing. In contrast, “perceptual hashing,” which outputs identical values for perceptually identical content, has attracted attention as a robust alternative. In our previous work, we confirmed the basic effectiveness of audio perceptual hash generation using CNN andWav2Vec2 model weights on a small-scale dataset. This study aims to clarify the impact of training dataset scale on model performance by evaluating identification accuracy and generalization for both models on small-scale versus full-scale datasets. The results demonstrate that theWav2Vec2 model exhibits extremely high robustness and adaptability to data scale expansion.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12326962

書誌情報

研究報告インターネットと運用技術（IOT）

巻 2026-IOT-72, 号 44, p. 1-8, 発行日 2026-02-24

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8787

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2026-02-16 07:20:07.429545

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

機械学習を用いた音声知覚ハッシュの大規模データセットによる評価

× 郭,承迅

× 細野,海人

× 宮田,高道

× 宮田,純子

× 木下,宏揚

× Kuo Chenghsun

× Kaito Hosono

× Takamichi Miyata

× Sumiko Miyata

× Hirotsugu Kinoshita

Versions

Share

Cite as

エクスポート