Item type |
SIG Technical Reports(1) |
公開日 |
2021-11-23 |
タイトル |
|
|
タイトル |
Predicting PRDM9 binding sites by a convolutional neural network and verification using genetic recombination map |
タイトル |
|
|
言語 |
en |
|
タイトル |
Predicting PRDM9 binding sites by a convolutional neural network and verification using genetic recombination map |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Graduate School of Information Science and Technology, Hokkaido University |
著者所属 |
|
|
|
Graduate School of Information Science and Technology, Hokkaido University/Faculty of Information Science and Technology, Hokkaido University |
著者所属 |
|
|
|
Graduate School of Information Science and Technology, Hokkaido University/Faculty of Information Science and Technology, Hokkaido University |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Information Science and Technology, Hokkaido University |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Information Science and Technology, Hokkaido University / Faculty of Information Science and Technology, Hokkaido University |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Information Science and Technology, Hokkaido University / Faculty of Information Science and Technology, Hokkaido University |
著者名 |
Takahiro, Nakamura
Toshinori, Endo
Naoki, Osada
|
著者名(英) |
Takahiro, Nakamura
Toshinori, Endo
Naoki, Osada
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
PR domain-containing 9 (PRDM9) is a zinc-finger protein that binds to specific DNA motifs and induces the crossing-over between chromosomes, resulting in a high recombination rate around binding sites. In this study, we developed a strategy to evaluate the prediction accuracy of PRDM9 binding site by examining the correlation with local recombination rate to avoid the effect of overfitting to one type of data. We compared the methods using position-specific weight matrix (PWM), which has been commonly used in previous studies, and convolutional network (CNN), which has recently performed well. Approximately 170,000 genomic DNA fragments of humans (301 bp each) containing the Chromatin Immuno-Precipitation with high-throughput sequencing (ChIP-seq) peak of PRDM9 of B-allele in the HEK293T cell line were used for constructing PWM and positive data to train CNN. We found that CNN outperformed PWM in terms of area under the curve, and the prediction scores of CNN correlated more strongly with the local recombination rate than PWM. We also investigated the potential PRDM9 binding sites missed by the ChIP-seq experiments but labeled as positive in CNN and discuss the reason for the difference in performances. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
PR domain-containing 9 (PRDM9) is a zinc-finger protein that binds to specific DNA motifs and induces the crossing-over between chromosomes, resulting in a high recombination rate around binding sites. In this study, we developed a strategy to evaluate the prediction accuracy of PRDM9 binding site by examining the correlation with local recombination rate to avoid the effect of overfitting to one type of data. We compared the methods using position-specific weight matrix (PWM), which has been commonly used in previous studies, and convolutional network (CNN), which has recently performed well. Approximately 170,000 genomic DNA fragments of humans (301 bp each) containing the Chromatin Immuno-Precipitation with high-throughput sequencing (ChIP-seq) peak of PRDM9 of B-allele in the HEK293T cell line were used for constructing PWM and positive data to train CNN. We found that CNN outperformed PWM in terms of area under the curve, and the prediction scores of CNN correlated more strongly with the local recombination rate than PWM. We also investigated the potential PRDM9 binding sites missed by the ChIP-seq experiments but labeled as positive in CNN and discuss the reason for the difference in performances. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA12055912 |
書誌情報 |
研究報告バイオ情報学(BIO)
巻 2021-BIO-68,
号 1,
p. 1-7,
発行日 2021-11-23
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8590 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |