<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-12T13:40:22Z</responseDate>
  <request identifier="oai:ipsj.ixsq.nii.ac.jp:00241581" metadataPrefix="jpcoar_1.0" verb="GetRecord">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00241581</identifier>
        <datestamp>2025-01-19T07:36:51Z</datestamp>
        <setSpec>1164:4179:11560:11869</setSpec>
      </header>
      <metadata>
        <jpcoar:jpcoar xmlns:datacite="https://schema.datacite.org/meta/kernel-4/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcndl="http://ndl.go.jp/dcndl/terms/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/" xmlns:oaire="http://namespace.openaire.eu/schema/oaire/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rioxxterms="http://www.rioxx.net/schema/v2.0/rioxxterms/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="https://github.com/JPCOAR/schema/blob/master/1.0/" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
          <dc:title>話し言葉音声合成のためのテキスト発話スタイル変換の改良</dc:title>
          <dc:title xml:lang="en">Improvements of Spoken-Text-Style Transfer for Spontaneous Speech Synthesis</dc:title>
          <jpcoar:creator>
            <jpcoar:creatorName>中田, 優翔</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>吉岡, 大貴</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>ホワン, ウェンチン</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>戸田, 智基</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:subject subjectScheme="Other">ポスターセッション</jpcoar:subject>
          <datacite:description descriptionType="Other">テキストスタイル変換とは，テキストの意味を保持しながら，所望のスタイルに変換する処理であり，話し言葉音声合成の前処理としての応用が期待される．本稿では，話し言葉音声の特徴である非流暢性に着目し，ノンパラレルデータを用いたテキストスタイル変換手法の改良を行う．まず，従来手法の問題点として，(1) 未知語が含まれた文に対応できない点と，(2) 非流暢性単語と他の単語の混同によりスタイル制御性能が劣化する点を解決するために，(1) Masked Language Model により未知語を既知語に一時的に置き換える手法と，(2) 非流暢性記号表現を導入する手法を提案する．さらに，(3) 非流暢性スタイルにおける話者性の影響を捉えるために，話者埋め込みを用いた話者依存非流暢性スタイル変換手法を提案する．客観評価指標に基づく実験的評価の結果から，提案手法により，(1) 未知語に対する頑健性が向上すること，(2) 非流暢性記号表現の使用によりスタイル制御性能が改善すること，(3) 話者性に基づく非流暢性単語の変換が可能であることを実証した．</datacite:description>
          <datacite:description descriptionType="Other">Spoken-Text-Style Transfer refers to the process of converting a given spoken text so that it has a desired style while preserving its semantic content. This process is particularly useful as a preprocessing step in spontaneous speech synthesis. In this report, we improve a non-parallel spoken-text-style transfer method to handle a disfluent style. There are two main issues in the previous method, (1) the performance degradation caused by unknown words in the input text, and (2) the performance degradation caused by confusion between disfluent words and the other words. To address these issues, we propose (1) the use of a Masked Language Model (MLM) to temporally replace unknown words with known ones, and (2) the use of Disfluency Symbol Representations (DSR). Furthermore, we propose (3) the speaker-dependent transfer method using speaker embeddings to model a speaker-dependent characteristics in a disfluent style. The experimental results of objective evaluation show that (1) the proposed method improves the robustness against unknown words by using MLM, (2) achieves higher transfer accuracy by using DSR compared with the previous method, and (3) has a potential to control disfluent words based on a given speaker information.</datacite:description>
          <dc:publisher xml:lang="ja">情報処理学会</dc:publisher>
          <datacite:date dateType="Issued">2024-12-05</datacite:date>
          <dc:language>jpn</dc:language>
          <dc:type rdf:resource="http://purl.org/coar/resource_type/c_18gh">technical report</dc:type>
          <jpcoar:identifier identifierType="URI">https://ipsj.ixsq.nii.ac.jp/records/241581</jpcoar:identifier>
          <jpcoar:sourceIdentifier identifierType="ISSN">2188-8779</jpcoar:sourceIdentifier>
          <jpcoar:sourceIdentifier identifierType="NCID">AN10115061</jpcoar:sourceIdentifier>
          <jpcoar:sourceTitle>研究報告自然言語処理（NL）</jpcoar:sourceTitle>
          <jpcoar:volume>2024-NL-262</jpcoar:volume>
          <jpcoar:issue>6</jpcoar:issue>
          <jpcoar:pageStart>1</jpcoar:pageStart>
          <jpcoar:pageEnd>6</jpcoar:pageEnd>
          <jpcoar:file>
            <jpcoar:URI label="IPSJ-NL24262006.pdf">https://ipsj.ixsq.nii.ac.jp/record/241581/files/IPSJ-NL24262006.pdf</jpcoar:URI>
            <jpcoar:mimeType>application/pdf</jpcoar:mimeType>
            <jpcoar:extent>1.2 MB</jpcoar:extent>
            <datacite:date dateType="Available">2026-12-05</datacite:date>
          </jpcoar:file>
        </jpcoar:jpcoar>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
