<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-05-12T03:46:24Z</responseDate>
  <request metadataPrefix="oai_dc" verb="GetRecord" identifier="oai:ipsj.ixsq.nii.ac.jp:00192700">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00192700</identifier>
        <datestamp>2025-01-20T00:02:25Z</datestamp>
        <setSpec>1164:5159:9402:9617</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Feature Transfer Learning for Wav2Text Sequence-to-Sequence ASR</dc:title>
          <dc:title>Feature Transfer Learning for Wav2Text Sequence-to-Sequence ASR</dc:title>
          <dc:creator>Andros, Tjandra</dc:creator>
          <dc:creator>Sakriani, Sakti</dc:creator>
          <dc:creator>Satoshi, Nakamura</dc:creator>
          <dc:creator>Andros, Tjandra</dc:creator>
          <dc:creator>Sakriani, Sakti</dc:creator>
          <dc:creator>Satoshi, Nakamura</dc:creator>
          <dc:subject>セッション1 音声認識</dc:subject>
          <dc:description>In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription. We called the model as ”Attention-basedWav2Text”. To assist the training process of the end-to-end model, we propose to utilize a feature transfer learning. Experimental results also reveal that the proposed Attention-based Wav2Text model directly with raw waveform could achieve a better result in comparison with the attentional encoder-decoder model trained on standard front-end filterbank features.</dc:description>
          <dc:description>In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription. We called the model as ”Attention-basedWav2Text”. To assist the training process of the end-to-end model, we propose to utilize a feature transfer learning. Experimental results also reveal that the proposed Attention-based Wav2Text model directly with raw waveform could achieve a better result in comparison with the attentional encoder-decoder model trained on standard front-end filterbank features.</dc:description>
          <dc:description>technical report</dc:description>
          <dc:publisher>情報処理学会</dc:publisher>
          <dc:date>2018-12-03</dc:date>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>研究報告音声言語情報処理（SLP）</dc:identifier>
          <dc:identifier>3</dc:identifier>
          <dc:identifier>2018-SLP-125</dc:identifier>
          <dc:identifier>1</dc:identifier>
          <dc:identifier>2</dc:identifier>
          <dc:identifier>2188-8663</dc:identifier>
          <dc:identifier>AN10442647</dc:identifier>
          <dc:identifier>https://ipsj.ixsq.nii.ac.jp/record/192700/files/IPSJ-SLP18125003.pdf</dc:identifier>
          <dc:language>eng</dc:language>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
