2024-03-29T23:27:36Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:000922872024-03-29T05:26:34Z01164:04179:06969:07176
Text Classification of Technical Papers Focusing on Title and Important SegmentsText Classification of Technical Papers Focusing on Title and Important Segmentseng解析・テキスト処理http://id.nii.ac.jp/1001/00092271/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=92287&item_no=1&attribute_id=1&file_no=1Copyright (c) 2013 by the Information Processing Society of JapanJapan Advanced Institute of Science and TechnologyJapan Advanced Institute of Science and TechnologyThienHaiNguyenKiyoaki, ShiraiThe goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper. Based on the idea that papers are well organized and some parts of papers are more important than others for text classification, segments such as title, abstract, introduction and conclusion are intensively used in text representation. In addition, new features called Title Bi-Gram and Title SigNoun are used to improve the performance. Title Bi-Gram is bi-gram in the title, while Title SigNoun is a noun in a head phrase in the title. The results of the experiments indicate that feature selection based on text segmentation and these two features are effective. Furthermore, we proposed a new model for text classification based on the structure of papers, called Back-off model, which achieves 60.45% Exact Match Ratio and 68.75% F-measure. It was also shown that the back-off model outperformed two existing methods, ML-kNN and Binary Approach.The goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper. Based on the idea that papers are well organized and some parts of papers are more important than others for text classification, segments such as title, abstract, introduction and conclusion are intensively used in text representation. In addition, new features called Title Bi-Gram and Title SigNoun are used to improve the performance. Title Bi-Gram is bi-gram in the title, while Title SigNoun is a noun in a head phrase in the title. The results of the experiments indicate that feature selection based on text segmentation and these two features are effective. Furthermore, we proposed a new model for text classification based on the structure of papers, called Back-off model, which achieves 60.45% Exact Match Ratio and 68.75% F-measure. It was also shown that the back-off model outperformed two existing methods, ML-kNN and Binary Approach.AN10115061研究報告自然言語処理(NL)2013-NL-21115182013-05-162013-05-14