<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-03-07T05:25:21Z</responseDate>
  <request metadataPrefix="jpcoar_1.0" verb="GetRecord" identifier="oai:ipsj.ixsq.nii.ac.jp:00232482">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00232482</identifier>
        <datestamp>2025-01-19T10:26:00Z</datestamp>
        <setSpec>1164:5159:11541:11549</setSpec>
      </header>
      <metadata>
        <jpcoar:jpcoar xmlns:datacite="https://schema.datacite.org/meta/kernel-4/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcndl="http://ndl.go.jp/dcndl/terms/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/" xmlns:oaire="http://namespace.openaire.eu/schema/oaire/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rioxxterms="http://www.rioxx.net/schema/v2.0/rioxxterms/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="https://github.com/JPCOAR/schema/blob/master/1.0/" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
          <dc:title>DDPMVC:連続時間拡散確率モデルを用いた非パラレル声質変換と評価</dc:title>
          <dc:title xml:lang="en">DDPMVC:Non-parallel voice conversion and evaluation using continuous diffusion probabilistic model</dc:title>
          <jpcoar:creator>
            <jpcoar:creatorName>畠山, 瑠一</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>奥田, 耕平</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>中鹿, 亘</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:subject subjectScheme="Other">SLP</jpcoar:subject>
          <datacite:description descriptionType="Other">本稿では，生成モデルの一つである連続時間拡散確率モデルを用いて，非パラレルデータでの声質変換モデルである DDPMVC を提案する．生成モデルとして近年用いられている拡散モデルは，高次元のデータに対して表現力が高く，従来の生成モデルよりも安定して学習できるのが特徴である．拡散モデルを用いた声質変換はいくつか提案されているが，非パラレルデータを用いて，任意の話者を複数の話者に変換 (Any-to-Many) する場合に対応しているモデルは少ない．ただしその中でも，VoiceGrad は，拡散モデルの一つであるスコアベースモデルを用いた声質変換モデルであり，前述した条件を満たしている．この VoiceGrad のモデルから派生し，DDPMVC は拡散モデルを連続時間化し，さらにエンコーダにルールベースの拡散過程を追加したモデルである．実験の評価では，VoiceGrad と DDPMVC で変換の精度の比較をメルケプストラム歪み（MCD）用いて行った．</datacite:description>
          <datacite:description descriptionType="Other">This paper proposes DDPMVC, a voice conversion model for nonparallel data, using a continuous-time diffusion probabilistic model , which is one of the generative models. Diffusion models, which have recently been used as generative models, are highly expressive for high-dimensional data and can be trained more stably than conventional generative models. Although several voice conversion models based on diffusion models have been proposed, few of them support the case of any-to-many conversion of any speakers to multiple speakers using nonparallel data. However, VoiceGrad is a voice conversion model using a score-based model, which is one of the diffusion models, and satisfies the aforementioned conditions. Derived from the VoiceGrad model, DDPMVC uses the continuous-time version of the diffusion model and adds a rule-based diffusion process to the encoder. The experimental evaluation compared the accuracy of the VoiceGrad and DDPMVC transforms in terms of mel-cepstrum distortion (MCD).</datacite:description>
          <dc:publisher xml:lang="ja">情報処理学会</dc:publisher>
          <datacite:date dateType="Issued">2024-02-22</datacite:date>
          <dc:language>jpn</dc:language>
          <dc:type rdf:resource="http://purl.org/coar/resource_type/c_18gh">technical report</dc:type>
          <jpcoar:identifier identifierType="URI">https://ipsj.ixsq.nii.ac.jp/records/232482</jpcoar:identifier>
          <jpcoar:sourceIdentifier identifierType="ISSN">2188-8663</jpcoar:sourceIdentifier>
          <jpcoar:sourceIdentifier identifierType="NCID">AN10442647</jpcoar:sourceIdentifier>
          <jpcoar:sourceTitle>研究報告音声言語情報処理（SLP）</jpcoar:sourceTitle>
          <jpcoar:volume>2024-SLP-151</jpcoar:volume>
          <jpcoar:issue>12</jpcoar:issue>
          <jpcoar:pageStart>1</jpcoar:pageStart>
          <jpcoar:pageEnd>6</jpcoar:pageEnd>
          <jpcoar:file>
            <jpcoar:URI label="IPSJ-SLP24151012.pdf">https://ipsj.ixsq.nii.ac.jp/record/232482/files/IPSJ-SLP24151012.pdf</jpcoar:URI>
            <jpcoar:mimeType>application/pdf</jpcoar:mimeType>
            <jpcoar:extent>289.4 kB</jpcoar:extent>
            <datacite:date dateType="Available">2026-02-22</datacite:date>
          </jpcoar:file>
        </jpcoar:jpcoar>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
