効率よいVF符号のためのMDL原理に基づく分節木の訓練手法

吉田, 諭史; 喜田, 拓也; Satoshi, Yoshida; Takuya, Kida

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

効率よいVF符号のためのMDL原理に基づく分節木の訓練手法

https://ipsj.ixsq.nii.ac.jp/records/75509

名前 / ファイル	ライセンス	アクション
IPSJ-IFAT11103014.pdf (303.0 kB)	Copyright (c) 2011 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2011-07-26

タイトル

効率よいVF符号のためのMDL原理に基づく分節木の訓練手法

タイトル

言語

タイトル

A method of training parse trees for efficient VF coding based on MDL principle

言語

jpn

キーワード

主題Scheme

Other

主題

●アルゴリズム

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

北海道大学大学院情報科学研究科

著者所属

北海道大学大学院情報科学研究科

著者所属(英)

Graduate School of Information Science and Technology, Hokkaido University

著者所属(英)

Graduate School of Information Science and Technology, Hokkaido University

著者名

吉田, 諭史

著者名(英)

Satoshi, Yoshida

論文抄録

内容記述タイプ

Other

内容記述

本稿では，VF 符号における辞書データ構造を訓練することで VF 符号の性能を向上させる方法について論じる．ここで議論する VF 符号とは，分節木と呼ばれる木構造を用いて入力テキストを可変長のブロックに分割し，各ブロックに固定長の符号語を割り当てることでテキスト圧縮を達成する情報源符号化方法である．VF 符号の圧縮率は分節木によって決定されるが，入力テキストに対して最適な分節木を構築することは NP 困難であることが知られている．そのため，最適に近い分節木を構築するために，入力テキストを繰り返し走査しつつ分節木を訓練させるという手法が提案されている．しかしながら，既存の手法では，符号化の際に事前に符号語長をパラメータとして与える必要があった．符号語長を長くすればテキストの分割数は減少するが，逆に分節木を保存するコストが増大し最終的なデータ圧縮率を低下させうる．本稿では，MDL 原理に基づく指標を示し，それによって分節木に登録する文字列を貪欲に決定していく訓練アルゴリズムを提案する．このアルゴリズムは，訓練する過程において符号語長を自動的に調節する．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we discuss a method to improve the performance of VF codes by training the dictionary data structures. A VF code, we discuss here, is a source coding scheme that parses an input text into variable-length blocks with a dictionary tree, which is called a parse tree, and then encodes them by fixed-length codewords. Although the compression ratio of a VF code depends on the parse tree, the problem of constructing the optimal tree for an input text is NP-hard. Thus, the methods that train the parse tree by scanning the text repeatedly in order to reconstruct the tree so that it closes to the optimal one, have been proposed so far. However, the existent algorithms need the codeword length as a parameter before encoding the input. Although the number of parsed blocks decreases when we set the codeword length long, the compression ratio can be depressed since the cost for storing the parse tree increases. In this paper, we present a criterion based on the MDL principle and propose a training algorithm that greedily determines strings to be entered into the parse tree according to the criterion. The proposed algorithm automatically adjusts the codeword length while the training.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10114171

書誌情報

研究報告情報基礎とアクセス技術（IFAT）

巻 2011-IFAT-103, 号 14, p. 1-6, 発行日 2011-07-26

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 21:12:21.723883

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

効率よいVF符号のためのMDL原理に基づく分節木の訓練手法

× 吉田, 諭史

× Satoshi, Yoshida

Versions

Share

Cite as

エクスポート