<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-21T12:51:01Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:ipsj.ixsq.nii.ac.jp:00234740">https://ipsj.ixsq.nii.ac.jp/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:ipsj.ixsq.nii.ac.jp:00234740</identifier>
        <datestamp>2025-01-19T09:42:27Z</datestamp>
        <setSpec>1164:5159:11541:11627</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Enhancing Feature Integration to Improve Classification Accuracy of Similar Categories in Acoustic Scene Classification</dc:title>
          <dc:title>Enhancing Feature Integration to Improve Classification Accuracy of Similar Categories in Acoustic Scene Classification</dc:title>
          <dc:creator>Shuting, Hao</dc:creator>
          <dc:creator>Daisuke, Saito</dc:creator>
          <dc:creator>Nobuaki, Minematsu</dc:creator>
          <dc:creator>Shuting, Hao</dc:creator>
          <dc:creator>Daisuke, Saito</dc:creator>
          <dc:creator>Nobuaki, Minematsu</dc:creator>
          <dc:subject>ポスターセッション2</dc:subject>
          <dc:description>This study focuses on Acoustic Scene Classification (ASC), which categorizes environmental audio streams into predefined semantic labels. We introduce a novel architecture that integrates multi-layer classifiers and direct finetuning, presenting a new perspective in ASC research. The study employs the TAU Urban Acoustic Scenes 2022 Mobile dataset for fine-tuning and validation. We utilized the SSAST model, pre-trained on the AudioSet and LibriSpeech datasets, and fine-tuned it on the TAU dataset with a unique approach to enhance ASC-specific feature learning. Our layered SSAST system achieved an accuracy of 52.17% and an AUC of 88.66% in ASC, marking a notable improvement over the baseline with absolute increases of 0.99% in accuracy and 0.85% in AUC.</dc:description>
          <dc:description>This study focuses on Acoustic Scene Classification (ASC), which categorizes environmental audio streams into predefined semantic labels. We introduce a novel architecture that integrates multi-layer classifiers and direct finetuning, presenting a new perspective in ASC research. The study employs the TAU Urban Acoustic Scenes 2022 Mobile dataset for fine-tuning and validation. We utilized the SSAST model, pre-trained on the AudioSet and LibriSpeech datasets, and fine-tuned it on the TAU dataset with a unique approach to enhance ASC-specific feature learning. Our layered SSAST system achieved an accuracy of 52.17% and an AUC of 88.66% in ASC, marking a notable improvement over the baseline with absolute increases of 0.99% in accuracy and 0.85% in AUC.</dc:description>
          <dc:description>technical report</dc:description>
          <dc:publisher>情報処理学会</dc:publisher>
          <dc:date>2024-06-07</dc:date>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>研究報告音声言語情報処理（SLP）</dc:identifier>
          <dc:identifier>53</dc:identifier>
          <dc:identifier>2024-SLP-152</dc:identifier>
          <dc:identifier>1</dc:identifier>
          <dc:identifier>5</dc:identifier>
          <dc:identifier>2188-8663</dc:identifier>
          <dc:identifier>AN10442647</dc:identifier>
          <dc:identifier>https://ipsj.ixsq.nii.ac.jp/record/234740/files/IPSJ-SLP24152053.pdf</dc:identifier>
          <dc:language>eng</dc:language>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
