WEKO3
アイテム
A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction
https://ipsj.ixsq.nii.ac.jp/records/18595
https://ipsj.ixsq.nii.ac.jp/records/1859506168ec3-34cb-43c1-b0d4-ca9a7c258d93
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2008 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | Trans(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2008-03-15 | |||||||
タイトル | ||||||||
タイトル | A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction | |||||||
言語 | ||||||||
言語 | eng | |||||||
キーワード | ||||||||
主題Scheme | Other | |||||||
主題 | Original Papers | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||
資源タイプ | journal article | |||||||
著者所属 | ||||||||
Graduate School of Information Science and Technology Osaka University | ||||||||
著者所属 | ||||||||
Graduate School of Information Science and Technology Osaka University | ||||||||
著者所属 | ||||||||
Graduate School of Information Science and Technology Osaka University | ||||||||
著者所属 | ||||||||
Graduate School of Information Science and Technology Osaka University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Graduate School of Information Science and Technology,Osaka University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Graduate School of Information Science and Technology,Osaka University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Graduate School of Information Science and Technology,Osaka University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Graduate School of Information Science and Technology,Osaka University | ||||||||
著者名 |
Gen, Kawamura
× Gen, Kawamura
|
|||||||
著者名(英) |
Gen, Kawamura
× Gen, Kawamura
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | Chemical and biological activities of compounds provide valuable information for discovering new drugs. The compound fingerprint that is represented by structural information of the activities is used for candidates for investigating similarity. However there are several problems with predicting accuracy from the requirement in the compound structural similarity. Although the amount of compound data is growing rapidly the number of well-annotated compounds e.g. those in the MDL Drug Data Report (MDDR) database has not increased quickly. Since the compounds that are known to have some activities of a biological class of the target are rare in the drug discovery process the accuracy of the prediction should be increased as the activity decreases or the false positive rate should be maintained in databases that have a large number of un-annotated compounds and a small number of annotated compounds of the biological activity. In this paper we propose a new similarity scoring method composed of a combination of the Tanimoto coefficient and the proximity measure of random forest. The score contains two properties that are derived from unsupervised and supervised methods of partial dependence for compounds. Thus the proposed method is expected to indicate compounds that have accurate activities. By evaluating the performance of the prediction compared with the two scores of the Tanimoto coefficient and the proximity measure we demonstrate that the prediction result of the proposed scoring method is better than those of the two methods by using the Linear Discriminant Analysis (LDA) method. We estimate the prediction accuracy of compound datasets extracted from MDDR using the proposed method. It is also shown that the proposed method can identify active compounds in datasets including several un-annotated compounds. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | Chemical and biological activities of compounds provide valuable information for discovering new drugs. The compound fingerprint that is represented by structural information of the activities is used for candidates for investigating similarity. However, there are several problems with predicting accuracy from the requirement in the compound structural similarity. Although the amount of compound data is growing rapidly, the number of well-annotated compounds, e.g., those in the MDL Drug Data Report (MDDR) database, has not increased quickly. Since the compounds that are known to have some activities of a biological class of the target are rare in the drug discovery process, the accuracy of the prediction should be increased as the activity decreases or the false positive rate should be maintained in databases that have a large number of un-annotated compounds and a small number of annotated compounds of the biological activity. In this paper, we propose a new similarity scoring method composed of a combination of the Tanimoto coefficient and the proximity measure of random forest. The score contains two properties that are derived from unsupervised and supervised methods of partial dependence for compounds. Thus, the proposed method is expected to indicate compounds that have accurate activities. By evaluating the performance of the prediction compared with the two scores of the Tanimoto coefficient and the proximity measure, we demonstrate that the prediction result of the proposed scoring method is better than those of the two methods by using the Linear Discriminant Analysis (LDA) method. We estimate the prediction accuracy of compound datasets extracted from MDDR using the proposed method. It is also shown that the proposed method can identify active compounds in datasets including several un-annotated compounds. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AA12177013 | |||||||
書誌情報 |
IPSJ Transactions on Bioinformatics (TBIO) 巻 49, 号 SIG5(TBIO4), p. 46-57, 発行日 2008-03-15 |
|||||||
ISSN | ||||||||
収録物識別子タイプ | ISSN | |||||||
収録物識別子 | 1882-6679 | |||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |