http://swrc.ontoware.org/ontology#Article
RNA Pseudoknotted Structure Prediction Using Stochastic Multiple Context-Free Grammar
en
Original Papers（IPSJ Best Paper Award、論文賞受賞）
Graduate School of Information Science Nara Institute of Science and Technology
Graduate School of Information Science Nara Institute of Science and Technology
Graduate School of Information Science Nara Institute of Science and Technology
Yuki Kato
Hiroyuki Seki
Tadao Kasami
Many attempts have so far been made at modeling RNA secondary structure by formal grammars. In a grammatical approach secondary structure prediction can be viewed as parsing problem. However there may be many different derivation trees for an input sequence. Thus it is necessary to have a method of extracting biologically realistic derivation trees among them. One solution to this problem is to extend a grammar to a probabilistic model and find the most likely derivation tree and another is to take free energy minimization into account. One simple formalism for describing RNA folding is context-free grammars (CFGs) but it is known that CFGs cannot represent pseudoknots. Therefore several formal grammars have been proposed for modeling RNA pseudoknotted structure. In this paper we focus on multiple context-free grammars (MCFGs) which are natural extension of CFGs and can represent pseudoknots and extend MCFGs to a probabilistic model called stochastic MCFG (SMCFG). We present a polynomial time parsing algorithm for finding the most probable derivation tree which is applicable to RNA secondary structure prediction including pseudoknots. Also we propose a probability parameter estimation algorithm based on the EM (expectation maximization) algorithm. Finally we show some experimental results on RNA pseudoknot prediction using the SMCFG parsing algorithm which show good prediction accuracy.
Many attempts have so far been made at modeling RNA secondary structure by formal grammars. In a grammatical approach, secondary structure prediction can be viewed as parsing problem. However, there may be many different derivation trees for an input sequence. Thus, it is necessary to have a method of extracting biologically realistic derivation trees among them. One solution to this problem is to extend a grammar to a probabilistic model and find the most likely derivation tree, and another is to take free energy minimization into account. One simple formalism for describing RNA folding is context-free grammars (CFGs), but it is known that CFGs cannot represent pseudoknots. Therefore, several formal grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple context-free grammars (MCFGs), which are natural extension of CFGs and can represent pseudoknots, and extend MCFGs to a probabilistic model called stochastic MCFG (SMCFG). We present a polynomial time parsing algorithm for finding the most probable derivation tree, which is applicable to RNA secondary structure prediction including pseudoknots. Also, we propose a probability parameter estimation algorithm based on the EM (expectation maximization) algorithm. Finally, we show some experimental results on RNA pseudoknot prediction using the SMCFG parsing algorithm, which show good prediction accuracy.
AA12177013
IPSJ Transactions on Bioinformatics （TBIO）
47
SIG17(TBIO1)
12-21
2006-11-15
1882-6679