<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-14T09:53:44Z</responseDate>
  <request identifier="oai:ipsj.ixsq.nii.ac.jp:00209762" metadataPrefix="jpcoar_1.0" verb="GetRecord">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00209762</identifier>
        <datestamp>2025-01-19T18:23:58Z</datestamp>
        <setSpec>1164:5159:10515:10530</setSpec>
      </header>
      <metadata>
        <jpcoar:jpcoar xmlns:datacite="https://schema.datacite.org/meta/kernel-4/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcndl="http://ndl.go.jp/dcndl/terms/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/" xmlns:oaire="http://namespace.openaire.eu/schema/oaire/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rioxxterms="http://www.rioxx.net/schema/v2.0/rioxxterms/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="https://github.com/JPCOAR/schema/blob/master/1.0/" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
          <dc:title>Comparison of End-to-End Models for Joint Speaker and Speech Recognition</dc:title>
          <dc:title xml:lang="en">Comparison of End-to-End Models for Joint Speaker and Speech Recognition</dc:title>
          <jpcoar:creator>
            <jpcoar:creatorName>Kak, Soky</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>Sheng, Li</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>Masato, Mimura</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>Chenhui, Chu</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName>Tatsuya, Kawahara</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName xml:lang="en">Kak, Soky</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName xml:lang="en">Sheng, Li</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName xml:lang="en">Masato, Mimura</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName xml:lang="en">Chenhui, Chu</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:creator>
            <jpcoar:creatorName xml:lang="en">Tatsuya, Kawahara</jpcoar:creatorName>
          </jpcoar:creator>
          <jpcoar:subject subjectScheme="Other">SP1</jpcoar:subject>
          <datacite:description descriptionType="Other">In this paper, we investigate the effectiveness of using speaker information on the performance of speaker- imbalanced automatic speech recognition (ASR). We identify major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.</datacite:description>
          <datacite:description descriptionType="Other">In this paper, we investigate the effectiveness of using speaker information on the performance of speaker- imbalanced automatic speech recognition (ASR). We identify major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.</datacite:description>
          <dc:publisher xml:lang="ja">情報処理学会</dc:publisher>
          <datacite:date dateType="Issued">2021-02-24</datacite:date>
          <dc:language>eng</dc:language>
          <dc:type rdf:resource="http://purl.org/coar/resource_type/c_18gh">technical report</dc:type>
          <jpcoar:identifier identifierType="URI">https://ipsj.ixsq.nii.ac.jp/records/209762</jpcoar:identifier>
          <jpcoar:sourceIdentifier identifierType="ISSN">2188-8663</jpcoar:sourceIdentifier>
          <jpcoar:sourceIdentifier identifierType="NCID">AN10442647</jpcoar:sourceIdentifier>
          <jpcoar:sourceTitle>研究報告音声言語情報処理（SLP）</jpcoar:sourceTitle>
          <jpcoar:volume>2021-SLP-136</jpcoar:volume>
          <jpcoar:issue>24</jpcoar:issue>
          <jpcoar:pageStart>1</jpcoar:pageStart>
          <jpcoar:pageEnd>5</jpcoar:pageEnd>
          <jpcoar:file>
            <jpcoar:URI label="IPSJ-SLP21136024.pdf">https://ipsj.ixsq.nii.ac.jp/record/209762/files/IPSJ-SLP21136024.pdf</jpcoar:URI>
            <jpcoar:mimeType>application/pdf</jpcoar:mimeType>
            <jpcoar:extent>1.8 MB</jpcoar:extent>
          </jpcoar:file>
        </jpcoar:jpcoar>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
