http://swrc.ontoware.org/ontology#TechnicalReport
圧縮テキスト上での<i>q</i>-gram非重複頻度の効率的な計算とその応用
en
Department of Informatics
Department of Electrical Engineering and Computer Science
Department of Informatics
Graduate School of Information Science and Electrical Engineering Kyushu University
Department of Informatics
Keisuke Goto
Nami Fukui
Hideo Bannai
Shunsuke Ikenaga
Masayuki Takeda
In many problems concerning text data, length-q substrings, or q-grams, can represent important characteristics of the data. Determining the frequencies of all q-grams contained in the data is an important problem with many applications. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text represented as a straight line program (SLP). We show that the problem can be solved in O(q2n) time, where n is the size of the SLP. We also show an interesting application of the algorithm, which converts an arbitrary SLP to an SLP that is constructed based on frequency information.
In many problems concerning text data, length-q substrings, or q-grams, can represent important characteristics of the data. Determining the frequencies of all q-grams contained in the data is an important problem with many applications. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text represented as a straight line program (SLP). We show that the problem can be solved in O(q2n) time, where n is the size of the SLP. We also show an interesting application of the algorithm, which converts an arbitrary SLP to an SLP that is constructed based on frequency information.
AN1009593X
研究報告アルゴリズム（AL）
2011-AL-134
11
1-7
2011-02-28