機械学習を利用した構文情報に基づく自動生成ファイルの特定

下仲, 健斗; 鷲見, 創一; 肥後, 芳樹; 楠本, 真二; Kento, Shimonaka; Soichi, Sumi; Yoshiki, Higo; Shinji, Kusumoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

機械学習を利用した構文情報に基づく自動生成ファイルの特定

https://ipsj.ixsq.nii.ac.jp/records/178662

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5804010.pdf (625.8 kB)	Copyright (c) 2017 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2017-04-15

タイトル

機械学習を利用した構文情報に基づく自動生成ファイルの特定

タイトル

言語

タイトル

Identifying Auto-Generated Files by Using Machine Learning Techniques Based on Syntactic Information

言語

jpn

キーワード

主題Scheme

Other

主題

[特集：ソフトウェア工学] 自動生成ファイル，機械学習，ソースコード解析，ソフトウェア保守

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

大阪大学大学院情報科学研究科

著者所属

大阪大学大学院情報科学研究科

著者所属

大阪大学大学院情報科学研究科

著者所属

大阪大学大学院情報科学研究科

著者所属(英)

Graduate School of Information Science and Technology

著者所属(英)

Graduate School of Information Science and Technology

著者所属(英)

Graduate School of Information Science and Technology

著者所属(英)

Graduate School of Information Science and Technology

著者名

下仲, 健斗
鷲見, 創一
肥後, 芳樹
楠本, 真二

著者名(英)

Kento, Shimonaka
Soichi, Sumi
Yoshiki, Higo
Shinji, Kusumoto

論文抄録

内容記述タイプ

Other

内容記述

近年，ソースコード解析に関する研究がさかんに行われている．解析対象のソースファイルの中にはしばしば自動生成ファイルが含まれており，多くの場合自動生成ファイルは解析の対象にはならず除外される．自動生成ファイルを除外する方法として，自動生成ファイル内に存在する特有のコメント文を文字列検索することにより特定するという方法がある．しかしこの方法では，自動生成ファイル特有のコメント文が消された場合に，自動的に自動生成ファイルを特定することができない．また，ソースファイルが自動生成ファイルであるかどうか，1つずつ目視で特定するのは時間的コストが大きい．そこで本研究では，機械学習を用いて任意の自動生成ファイルを自動的に特定する手法を提案する．提案手法では，ソースファイルの構文情報を学習することで自動生成ファイルであるかどうかを判定する．また，提案手法を評価するために，4つの自動生成プログラムから生成された自動生成ファイル群を対象に実験を行った．その結果，90%以上の高い精度で自動生成ファイルを特定できることを確認した．

論文抄録(英)

内容記述タイプ

Other

内容記述

These days, source code analysis is keenly studied because it came into use in practice and research such as mining source code repositories. We often see auto-generated files in target repositories, and remove them prior to source code analysis because they can be noise for source code analysis. We can remove auto-generated files by searching particular comments which exist in auto-generated files. However, we cannot identify auto-generated files automatically with such a way if comments have been deleted. Moreover, manually identifying auto-generated files makes us spend too much time. Therefore, in this study we propose a method to identify auto-generated files automatically by using machine learning techniques. In our method, we learn syntactic information of source code. Then, we can identify whether source files are auto-generated files or not. In this study, in order to evaluate the proposed method, we conducted experiments with source files generated by four kinds of code generators. As a result, we confirmed that the proposed method was able to identify auto-generated files with high accuracy.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 58, 号 4, p. 861-870, 発行日 2017-04-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 05:02:20.739815

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

機械学習を利用した構文情報に基づく自動生成ファイルの特定

× 下仲, 健斗

× 鷲見, 創一

× 肥後, 芳樹

× 楠本, 真二

× Kento, Shimonaka

× Soichi, Sumi

× Yoshiki, Higo

× Shinji, Kusumoto

Versions

Share

Cite as

エクスポート