Item type |
SIG Technical Reports(1) |
公開日 |
2022-11-22 |
タイトル |
|
|
タイトル |
Extending a deep learning-based RNA secondary structure prediction algorithm for RNA modifications |
タイトル |
|
|
言語 |
en |
|
タイトル |
Extending a deep learning-based RNA secondary structure prediction algorithm for RNA modifications |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Keio University |
著者所属 |
|
|
|
Keio University |
著者所属 |
|
|
|
Tokyo Denki University |
著者所属(英) |
|
|
|
en |
|
|
Keio University |
著者所属(英) |
|
|
|
en |
|
|
Keio University |
著者所属(英) |
|
|
|
en |
|
|
Tokyo Denki University |
著者名 |
Naoki, Mikamo
Yasubumi, Sakakibara
Kengo, Sato
|
著者名(英) |
Naoki, Mikamo
Yasubumi, Sakakibara
Kengo, Sato
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Various experimental and computational methods have been proposed for RNA secondary structure prediction. However, computational prediction of RNA secondary structure considering RNA modifications has not been done yet. In this study, we attempted to develop a method for predicting secondary structure from RNA sequences containing RNA modifications. Our method is based on MXfold2, the most accurate computational RNA secondary structure method based on deep learning that does not take into account RNA modifications. We have developed two types of representations of modified bases: one-hot representation, which is the same as before, and chemical fingerprinting. In particular, the fingerprinting method allows bases to be input as chemical structures and is expected to predict the secondary structure of modified bases with higher accuracy than the one-hot representation. Then, we built our dataset including RNA modifications. Since RNA sequences containing modifications and their secondary structures are limited, we trained on a dataset that did not include modifications and then fine-tuned it with tRNA data to handle the modifications. The dataset with modifications used in this study was obtained from MODOMICS, a database of RNAs containing modifications, and other literature. For benchmarking, our method was trained on two types of sequences, one with and one without modifications during fine tuning. We compared the base representations of the existing methods MXfold2, the one-hot representation extended to the modified bases, and the fingerprinting representation. Comparison with MXfold2 shows that it is possible to predict the secondary structure of RNA more accurately by distinguishing modifications for input sequences that contain modifications. It was also suggested that the use of fingerprint representation rather than one-hot representation can deal with RNA modifications that do not appear in the training data. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Various experimental and computational methods have been proposed for RNA secondary structure prediction. However, computational prediction of RNA secondary structure considering RNA modifications has not been done yet. In this study, we attempted to develop a method for predicting secondary structure from RNA sequences containing RNA modifications. Our method is based on MXfold2, the most accurate computational RNA secondary structure method based on deep learning that does not take into account RNA modifications. We have developed two types of representations of modified bases: one-hot representation, which is the same as before, and chemical fingerprinting. In particular, the fingerprinting method allows bases to be input as chemical structures and is expected to predict the secondary structure of modified bases with higher accuracy than the one-hot representation. Then, we built our dataset including RNA modifications. Since RNA sequences containing modifications and their secondary structures are limited, we trained on a dataset that did not include modifications and then fine-tuned it with tRNA data to handle the modifications. The dataset with modifications used in this study was obtained from MODOMICS, a database of RNAs containing modifications, and other literature. For benchmarking, our method was trained on two types of sequences, one with and one without modifications during fine tuning. We compared the base representations of the existing methods MXfold2, the one-hot representation extended to the modified bases, and the fingerprinting representation. Comparison with MXfold2 shows that it is possible to predict the secondary structure of RNA more accurately by distinguishing modifications for input sequences that contain modifications. It was also suggested that the use of fingerprint representation rather than one-hot representation can deal with RNA modifications that do not appear in the training data. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA12055912 |
書誌情報 |
研究報告バイオ情報学(BIO)
巻 2022-BIO-72,
号 10,
p. 1-1,
発行日 2022-11-22
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8590 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |