スペクトル変化量のピーク間隔・F0・MFCCを用いた歌声と朗読音声の自動識別システム

2024-04-20T12:27:19Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmh

oai:ipsj.ixsq.nii.ac.jp:000804012024-03-29T05:26:34Z01164:05064:06681:06682

スペクトル変化量のピーク間隔・F0・MFCCを用いた歌声と朗読音声の自動識別システムA System for Automatic Discrimination between Singing and Speaking Voices on the Basis of Peak Interval of Spectral Change, F0, and MFCCjpn歌声情報処理http://id.nii.ac.jp/1001/00080401/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=80401&item_no=1&attribute_id=1&file_no=1Copyright (c) 2012 by the Information Processing Society of Japan京都大学大学院情報学研究科金沢大学理工学域電子情報学類産業技術総合研究所京都大学大学院情報学研究科京都大学大学院情報学研究科京都大学大学院情報学研究科京都大学大学院情報学研究科阿曽, 慎平齋藤, 毅後藤, 真孝糸山, 克寿高橋, 徹尾形, 哲也奥乃, 博本稿では，歌声と朗読音声を識別するシステムについて述べる．入力は無雑音音声，出力は歌声と朗読音声それぞれの尤度（連続値）である．従来，スペクトル包絡（MFCC）と基本周波数（F0）の時間変化に基づいた識別システムが報告されている．これらの特徴量に基づく識別器に，スペクトル変化量のピーク間隔という，音素継続時間に関連する特徴量に基づく識別器を加え，入力音声長に応じて各識別器への重みを変化させた．実験の結果，従来システムでは1秒の音声に対し 86.7% の精度であったのに対し，本システムでは 90.2% という結果を得た．本システムが実時間で動作するデモアプリケーションについても述べる．In this paper we describe a system that discriminates between singing and speaking voices. Given a clean speech signal, it outputs the likelihood of each of the singing and speaking voices. Previous systems use temporal transition of spectral envelope (MFCC) and fundamental frequency (F0) as discrimina- tion features. Our system adds peak interval of spectral change as a phoneme duration feature and weights these features according to the duration of the input speech signal. Experimental results with one-second speech signal show that our system achieves 90.2 % accuracy compared to 86.7 % with previous systems. We also describe a real-time application demonstrating our system.AN10438388研究報告音楽情報科学（MUS）2012-MUS-9413182012-01-272012-01-30