2024-03-30T00:56:44Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:001024532024-03-29T05:26:34Z01164:01165:07638:07639
Finding Co-occurring Topics in Wikipedia Article SegmentsFinding Co-occurring Topics in Wikipedia Article Segmentsengトピック抽出・効率化http://id.nii.ac.jp/1001/00102430/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=102453&item_no=1&attribute_id=1&file_no=1Copyright (c) 2014 by the Information Processing Society of JapanGraduate School of Information, Production and Systems Waseda UniversityGraduate School of Information, Production and Systems Waseda UniversityGraduate School of Information, Production and Systems Waseda UniversityRenzhi, WangJianmin, WuMizuho, IwaiharaWikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. A number of researches about detecting topics and semantic similarity analysis are based on the Wikipedia corpus. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment.Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. A number of researches about detecting topics and semantic similarity analysis are based on the Wikipedia corpus. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment.AN10112482研究報告データベースシステム(DBS)2014-DBS-1598162014-07-252014-07-24