日本特有のコード生成ベンチマークの開発と評価

佐藤,美唯; 伊東,和香; 倉光,君郎; Miyu Sato; Waka Ito; Kimio Kuramitsu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

日本特有のコード生成ベンチマークの開発と評価

https://doi.org/10.20729/0002008641

名前 / ファイル	ライセンス	アクション
IPSJ-JNL6703004.pdf (1.1 MB) 2028年3月15日からダウンロード可能です。	Copyright (c) 2026 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, 論文誌:会員：¥0, DLIB:会員：¥0

Item type

Journal(1)

公開日

2026-03-15

タイトル

言語

タイトル

日本特有のコード生成ベンチマークの開発と評価

タイトル

言語

タイトル

Development and Evaluation of a Japan-specific Code Generation Benchmark

言語

jpn

キーワード

主題Scheme

Other

主題

[特集：若手研究者（特選論文）] コード生成，大規模言語モデル，ベンチマーク，日本語性能

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

ID登録

10.20729/0002008641

ID登録タイプ

JaLC

著者所属

日本女子大学大学院理学研究科数理・物性構造科学専攻

著者所属

日本女子大学大学院理学研究科数理・物性構造科学専攻

著者所属

日本女子大学理学部数理情報科学科

著者所属(英)

Graduate School of Science Division of Mathematical and Physical Sciences, Japan Women's University

著者所属(英)

Graduate School of Science Division of Mathematical and Physical Sciences, Japan Women's University

著者所属(英)

Department of Mathematics, Physics, and Computer Science, Japan Women's University

著者名

佐藤,美唯
伊東,和香
倉光,君郎

著者名(英)

Miyu Sato
Waka Ito
Kimio Kuramitsu

論文抄録

内容記述タイプ

Other

内容記述

コード生成ベンチマークは，大規模言語モデル（Large Language Model，LLM）の基礎的なプログラミング能力を評価するうえで不可欠である．我々はこれまでに，英語中心のデータで事前学習されたLLMが，コード生成ベンチマークHumanEvalとその日本語版JHumanEvalにおいて，ほぼ同等の性能を示すことを明らかにしてきた．この結果は，LLMが英語で獲得したコード生成能力を日本語でも活用している，言語間転移が生じている可能性を示唆している．しかし，HumanEvalとJHumanEvalは翻訳関係にあり，入力言語は異なるものの，要求されるコード生成能力は本質的に同一であることから，日本特有の文化的背景や日本語処理を反映したタスクまでは評価できない限界がある．本研究では，英語からの翻訳に依存しない日本特有のコード生成ベンチマークSakuraEvalを開発し，それを用いてLLMのコード生成能力を評価することを目指す．SakuraEvalは日本の文化的背景や日本語処理など日本独自の要件に対応したコード生成タスクで構成され，HumanEvalとJHumanEvalとは異なる観点からコード生成能力を評価する．本論文では，SakuraEvalを紹介するとともに，14種類のLLMを用いた評価実験の結果を報告する．

論文抄録(英)

内容記述タイプ

Other

内容記述

Code generation benchmarks are essential for evaluating the fundamental programming capabilities of LLMs. We have demonstrated that LLMs pre-trained primarily on English data perform similarly on HumanEval and its Japanese version, JHumanEval. This result suggests the possibility of cross-lingual transfer, whereby code generation capabilities learned in English are effectively used in Japanese. However, HumanEval and JHumanEval require the same code generation capabilities, as JHumanEval is translated from HumanEval. These benchmarks are limited in their ability to evaluate tasks that reflect Japan-specific cultural contexts or Japanese language processing. To address this limitation, we introduce SakuraEval, a Japan-specific code generation benchmark that does not rely on translation from English. SakuraEval comprises tasks designed to reflect Japanese cultural background and language-specific requirements, enabling distinct evaluation from two benchmarks. This paper presents SakuraEval and reports the evaluation results of 14 LLMs.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 67, 号 3, p. 537-545, 発行日 2026-03-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

公開者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2026-03-09 04:27:30.401298

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

日本特有のコード生成ベンチマークの開発と評価

× 佐藤,美唯

× 伊東,和香

× 倉光,君郎

× Miyu Sato

× Waka Ito

× Kimio Kuramitsu

Versions

Share

Cite as

エクスポート