Scalable Large-Variance Clone Detection

Tasuku, Nakagawa; Yoshiki, Higo; Shinji, Kusumoto; Tasuku, Nakagawa; Yoshiki, Higo; Shinji, Kusumoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Scalable Large-Variance Clone Detection

https://ipsj.ixsq.nii.ac.jp/records/209668

名前 / ファイル	ライセンス	アクション
IPSJ-SE21207011.pdf (793.1 kB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2021-02-22

タイトル

Scalable Large-Variance Clone Detection

タイトル

言語

タイトル

Scalable Large-Variance Clone Detection

言語

eng

キーワード

主題Scheme

Other

主題

コードクローン

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

Osaka University

著者所属

Osaka University

著者所属

Osaka University

著者所属(英)

Osaka University

著者所属(英)

Osaka University

著者所属(英)

Osaka University

著者名

Tasuku, Nakagawa
Yoshiki, Higo
Shinji, Kusumoto

著者名(英)

Tasuku, Nakagawa
Yoshiki, Higo
Shinji, Kusumoto

論文抄録

内容記述タイプ

Other

内容記述

A code clone (in short, clone) is a code fragment that is identical or similar to other code fragments in source code. Clones generated by a large number of changes to copy-and-pasted code fragments are called large-variance clones. It is difficult for general clone detection techniques to detect such clones and thus specialized techniques are necessary. In addition, with the rapid growth of software development, scalable clone detectors that can detect clones in large codebases are required. However, there are no existing techniques for quickly detecting large-variance clones in large codebases. In this paper, we propose a scalable clone detection technique that can detect large-variance clones from large codebases and describe its implementation, called NIL. NIL is a token-based clone detector that efficiently identifies clone candidates using an N-gram representation of token sequences and an inverted index. Then, NIL verifies the clone candidates by measuring their similarity based on the longest common subsequence between their token sequences. We evaluate NIL in terms of large-variance clone detection accuracy, general Type-1, Type-2, and Type-3 clone detection accuracy, and scalability. Our experimental results show that NIL has higher accuracy in terms of large-variance clone detection, equivalent accuracy in terms of general clone detection, and the shortest execution time for inputs of various sizes (1-250 MLOC) compared to existing state-of-the-art tools.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10112981

書誌情報

研究報告ソフトウェア工学（SE）

巻 2021-SE-207, 号 11, p. 1-8, 発行日 2021-02-22

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8825

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 18:26:41.034187

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Scalable Large-Variance Clone Detection

× Tasuku, Nakagawa

× Yoshiki, Higo

× Shinji, Kusumoto

× Tasuku, Nakagawa

× Yoshiki, Higo

× Shinji, Kusumoto

Versions

Share

Cite as

エクスポート