<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-19T21:53:18Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:ipsj.ixsq.nii.ac.jp:00234657">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00234657</identifier>
        <datestamp>2025-01-19T09:44:09Z</datestamp>
        <setSpec>1164:5064:11558:11626</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>An experimental study of accent embedding for text to accented speech synthesis</dc:title>
          <dc:title>An experimental study of accent embedding for text to accented speech synthesis</dc:title>
          <dc:creator>Hewei, Zhang</dc:creator>
          <dc:creator>Daisuke, Saito</dc:creator>
          <dc:creator>Nobuaki, Minematsu</dc:creator>
          <dc:creator>Hewei, Zhang</dc:creator>
          <dc:creator>Daisuke, Saito</dc:creator>
          <dc:creator>Nobuaki, Minematsu</dc:creator>
          <dc:subject>ポスターセッション2</dc:subject>
          <dc:description>In Text-to-Speech (TTS), End-to-End models had been introduced, which takes text as input and audio as output. This makes it hard to unsupervised control the style, especially accent, which consists of many kinds of acoustic features. We proposed a Phonetic Posterior-Gram-based unsupervised Accent Embedding Extraction model. Experiments showed the ability, robustness to different accent level of training dataset and deeper potential of the model to extract accent features from given utterance.</dc:description>
          <dc:description>In Text-to-Speech (TTS), End-to-End models had been introduced, which takes text as input and audio as output. This makes it hard to unsupervised control the style, especially accent, which consists of many kinds of acoustic features. We proposed a Phonetic Posterior-Gram-based unsupervised Accent Embedding Extraction model. Experiments showed the ability, robustness to different accent level of training dataset and deeper potential of the model to extract accent features from given utterance.</dc:description>
          <dc:description>technical report</dc:description>
          <dc:publisher>情報処理学会</dc:publisher>
          <dc:date>2024-06-07</dc:date>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>研究報告音楽情報科学（MUS）</dc:identifier>
          <dc:identifier>45</dc:identifier>
          <dc:identifier>2024-MUS-140</dc:identifier>
          <dc:identifier>1</dc:identifier>
          <dc:identifier>5</dc:identifier>
          <dc:identifier>2188-8752</dc:identifier>
          <dc:identifier>AN10438388</dc:identifier>
          <dc:identifier>https://ipsj.ixsq.nii.ac.jp/record/234657/files/IPSJ-MUS24140045.pdf</dc:identifier>
          <dc:language>eng</dc:language>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
