2024-03-28T19:26:20Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:001776202023-04-27T10:00:04Z01164:05064:09083:09087
Experiments in Making VOCALOID Synthesis More Human-like Using Deep LearningExperiments in Making VOCALOID Synthesis More Human-like Using Deep Learningeng歌声の分析と合成http://id.nii.ac.jp/1001/00177532/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=177620&item_no=1&attribute_id=1&file_no=1Copyright (c) 2017 by the Information Processing Society of JapanYamaha CorporationMusic Technology Group, Universitat Pompeu FabraYamaha CorporationYamaha CorporationMichael, WilsonPritish, ChandnaRyunosuke, DaidoYuji, HisaminatoDeep learning has recently been used to improve the results of many speech-related tasks. We applied deep learning to VOCALOID (TM), a singing voice synthesizer which uses concatenative synthesis, with the goal of making the synthesized sound more human-like. Previous work in this area includes using Hidden Markov Models (HMMs) to model the prosodic features of a specific singer or style, and using iterative parameter estimation to mimic target human singing. We focused on methods which work directly on audio data and audio features which can be automatically extracted, with no special markup or target singing required. We report the results of several experiments with various models and parameterizations, and suggest avenues for further research.Deep learning has recently been used to improve the results of many speech-related tasks. We applied deep learning to VOCALOID (TM), a singing voice synthesizer which uses concatenative synthesis, with the goal of making the synthesized sound more human-like. Previous work in this area includes using Hidden Markov Models (HMMs) to model the prosodic features of a specific singer or style, and using iterative parameter estimation to mimic target human singing. We focused on methods which work directly on audio data and audio features which can be automatically extracted, with no special markup or target singing required. We report the results of several experiments with various models and parameterizations, and suggest avenues for further research.AN10438388研究報告音楽情報科学(MUS)2017-MUS-1144142017-02-202188-87522017-02-16