Item type |
SIG Technical Reports(1) |
公開日 |
2024-07-15 |
タイトル |
|
|
タイトル |
Study on Potential of Speech-pathological Features for Deepfake Speech Detection |
タイトル |
|
|
言語 |
en |
|
タイトル |
Study on Potential of Speech-pathological Features for Deepfake Speech Detection |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
ISEC/EMM |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
School of Information Science, Japan Advanced Institute of Science and Technology/Sirindhorn International Institute of Technology, Thammasat University |
著者所属 |
|
|
|
National Science and Technology Development Agency |
著者所属 |
|
|
|
National Science and Technology Development Agency |
著者所属 |
|
|
|
Sirindhorn International Institute of Technology, Thammasat University |
著者所属 |
|
|
|
School of Information Science, Japan Advanced Institute of Science and Technology |
著者所属(英) |
|
|
|
en |
|
|
School of Information Science, Japan Advanced Institute of Science and Technology / Sirindhorn International Institute of Technology, Thammasat University |
著者所属(英) |
|
|
|
en |
|
|
National Science and Technology Development Agency |
著者所属(英) |
|
|
|
en |
|
|
National Science and Technology Development Agency |
著者所属(英) |
|
|
|
en |
|
|
Sirindhorn International Institute of Technology, Thammasat University |
著者所属(英) |
|
|
|
en |
|
|
School of Information Science, Japan Advanced Institute of Science and Technology |
著者名 |
Anuwat, Chaiwongyen
Suradej, Duangpummet
Jessada, Karnjana
Waree, Kongprawechnon
Masashi, Unoki
|
著者名(英) |
Anuwat, Chaiwongyen
Suradej, Duangpummet
Jessada, Karnjana
Waree, Kongprawechnon
Masashi, Unoki
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This paper proposes a method to detect deepfakes using speech-pathological features commonly used to assess unnaturalness in disordered voices associated with voice-production mechanisms. We investigated the potential of eleven speech-pathological features for distinguishing between genuine and deepfake speech, including jitter (three types), shimmer (four types), harmonics-to-noise ratio, cepstral-harmonics-to-noise ratio, normalized noise energy, and glottal-to-noise excitation ratio. This paper introduces a new method that employs segmental frames of analysis technique to significantly improve the effectiveness of deepfake speech detection. We evaluated the proposed method using the datasets from the Automatic Speaker Verification Spoofing and Countermeasures Challenges (ASVspoof). The results demonstrate that the proposed method outperforms the baselines in terms of recall and F2-score, achieving 99.46% and 98.59%, respectively, on the ASVspoof 2019 dataset. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This paper proposes a method to detect deepfakes using speech-pathological features commonly used to assess unnaturalness in disordered voices associated with voice-production mechanisms. We investigated the potential of eleven speech-pathological features for distinguishing between genuine and deepfake speech, including jitter (three types), shimmer (four types), harmonics-to-noise ratio, cepstral-harmonics-to-noise ratio, normalized noise energy, and glottal-to-noise excitation ratio. This paper introduces a new method that employs segmental frames of analysis technique to significantly improve the effectiveness of deepfake speech detection. We evaluated the proposed method using the datasets from the Automatic Speaker Verification Spoofing and Countermeasures Challenges (ASVspoof). The results demonstrate that the proposed method outperforms the baselines in terms of recall and F2-score, achieving 99.46% and 98.59%, respectively, on the ASVspoof 2019 dataset. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA11235941 |
書誌情報 |
研究報告コンピュータセキュリティ(CSEC)
巻 2024-CSEC-106,
号 45,
p. 1-6,
発行日 2024-07-15
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8655 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |