2024-03-19T12:26:52Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:002089502023-04-27T10:00:04Z01164:01165:10301:10460
Temporal Link Prediction for Wikipedia Articles Based on Random Walk and Graph EmbeddingTemporal Link Prediction for Wikipedia Articles Based on Random Walk and Graph Embeddingeng社会モデリングhttp://id.nii.ac.jp/1001/00208848/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=208950&item_no=1&attribute_id=1&file_no=1Copyright (c) 2020 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.Graduate School of Information, Production and Systems, Waseda UniversityGraduate School of Information, Production and Systems, Waseda UniversityJiaji, MaMizuho, IwaharaWikipedia is now one of the most popular multilingual online encyclopedias all over the world. Wikipedia articles contain hundreds of millions of hyperlinks (wiki internal links) connecting subjects to other Wikipedia pages. It is useful to predict future links for new articles users created. In this paper, we discuss a new method for link prediction on linked historical articles. First, we discuss construction of our temporal datasets by extracting articles from the whole Wikipedia mirror with article titles, article categories, and article hyperlinks in three different topics of thirteen snapshots. We also propose a graph embedding model with a temporal random walk which considers timestamp difference and semantic difference between article words. Along with random walk paths, we generate title sequences by concatenating the article titles on each path. Next, we discuss fine-tuning of a pre-trained RoBERTa-based model by several title sequences. We design our link prediction experiments by predicting future links between new nodes and existing nodes. For evaluation, we compare the prediction results of our model with three random walk-based graph embedding models, DeepWalk, Node2vec, and CTDNE, through AUC_ROC, AUC_PRC, Precision@k, Recall@k, and F1@k as evaluation metrics.Wikipedia is now one of the most popular multilingual online encyclopedias all over the world. Wikipedia articles contain hundreds of millions of hyperlinks (wiki internal links) connecting subjects to other Wikipedia pages. It is useful to predict future links for new articles users created. In this paper, we discuss a new method for link prediction on linked historical articles. First, we discuss construction of our temporal datasets by extracting articles from the whole Wikipedia mirror with article titles, article categories, and article hyperlinks in three different topics of thirteen snapshots. We also propose a graph embedding model with a temporal random walk which considers timestamp difference and semantic difference between article words. Along with random walk paths, we generate title sequences by concatenating the article titles on each path. Next, we discuss fine-tuning of a pre-trained RoBERTa-based model by several title sequences. We design our link prediction experiments by predicting future links between new nodes and existing nodes. For evaluation, we compare the prediction results of our model with three random walk-based graph embedding models, DeepWalk, Node2vec, and CTDNE, through AUC_ROC, AUC_PRC, Precision@k, Recall@k, and F1@k as evaluation metrics.AN10112482研究報告データベースシステム(DBS)2020-DBS-1723162020-12-142188-871x2020-12-11