<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-20T22:51:19Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:ipsj.ixsq.nii.ac.jp:00209762">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00209762</identifier>
        <datestamp>2025-01-19T18:23:58Z</datestamp>
        <setSpec>1164:5159:10515:10530</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Comparison of End-to-End Models for Joint Speaker and Speech Recognition</dc:title>
          <dc:title>Comparison of End-to-End Models for Joint Speaker and Speech Recognition</dc:title>
          <dc:creator>Kak, Soky</dc:creator>
          <dc:creator>Sheng, Li</dc:creator>
          <dc:creator>Masato, Mimura</dc:creator>
          <dc:creator>Chenhui, Chu</dc:creator>
          <dc:creator>Tatsuya, Kawahara</dc:creator>
          <dc:creator>Kak, Soky</dc:creator>
          <dc:creator>Sheng, Li</dc:creator>
          <dc:creator>Masato, Mimura</dc:creator>
          <dc:creator>Chenhui, Chu</dc:creator>
          <dc:creator>Tatsuya, Kawahara</dc:creator>
          <dc:subject>SP1</dc:subject>
          <dc:description>In this paper, we investigate the effectiveness of using speaker information on the performance of speaker- imbalanced automatic speech recognition (ASR). We identify major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.</dc:description>
          <dc:description>In this paper, we investigate the effectiveness of using speaker information on the performance of speaker- imbalanced automatic speech recognition (ASR). We identify major speakers and combine other speakers who have a small size of speech, and make a systematic comparison of three methods that use speaker information for ASR including speaker attribute augmentation (SAug), multi-task learning (MTL), and adversarial learning (AL). We conduct experiments on a large spontaneous speech corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) and an open Khmer text-to-speech corpus. As a result, we find that the use of speaker clustering information improves ASR performance including new speakers. Moreover, AL achieves better performance and more robustness in the speaker-independent setting compared to the other methods. It reduces errors of the baseline model by 4.32%, 5.46%, and 16.10% for the closed test, open test, and out-of-domain test, respectively.</dc:description>
          <dc:description>technical report</dc:description>
          <dc:publisher>情報処理学会</dc:publisher>
          <dc:date>2021-02-24</dc:date>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>研究報告音声言語情報処理（SLP）</dc:identifier>
          <dc:identifier>24</dc:identifier>
          <dc:identifier>2021-SLP-136</dc:identifier>
          <dc:identifier>1</dc:identifier>
          <dc:identifier>5</dc:identifier>
          <dc:identifier>2188-8663</dc:identifier>
          <dc:identifier>AN10442647</dc:identifier>
          <dc:identifier>https://ipsj.ixsq.nii.ac.jp/record/209762/files/IPSJ-SLP21136024.pdf</dc:identifier>
          <dc:language>eng</dc:language>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
