深層学習フレームワークにおけるマルチコアCPU向け計算グラフスケジューリング

樋口, 兼一; 田浦, 健次朗; Tomokazu, Higuchi; Kenjiro, Taura

WEKO3

インデックスツリー

RootNode

アイテム

深層学習フレームワークにおけるマルチコアCPU向け計算グラフスケジューリング

https://ipsj.ixsq.nii.ac.jp/records/202968

名前 / ファイル	ライセンス	アクション
IPSJ-TPRO1301009.pdf (105.0 kB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2020-01-29

タイトル

深層学習フレームワークにおけるマルチコアCPU向け計算グラフスケジューリング

タイトル

言語

タイトル

Scheduling Computation Graphs of Deep Learning Frameworks for Multi-core CPUs

言語

jpn

キーワード

主題Scheme

Other

主題

[発表概要，Unrefereed Presentation Abstract]

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者所属(英)

Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者名

樋口, 兼一
田浦, 健次朗

著者名(英)

Tomokazu, Higuchi
Kenjiro, Taura

論文抄録

内容記述タイプ

Other

内容記述

Chainerを始めとする多くの深層学習フレームワークは，ニューラルネットワークに含まれる層ごとの処理をノード，各層間の接続関係をエッジとした計算グラフを内部的に構築し，各ノードを逐次に実行することによりネットワークの学習を行う．そのようなフレームワークにおけるホットスポットは各ノード内部の処理であり，既存の研究・実装はノード単位の高速化に焦点を当ててきた．しかし，広く用いられているネットワークモデルの多くでは各処理が軽量であるために，マルチコアCPU上のすべてのコアを効率的に利用できず，結果として使用するコア数の増加に対して実行速度の向上は乏しい．そこでCPU上の複数のコアを複数のグループにまとめ，互いに独立な複数のノードを各グループに配分して実行することにより，ノード内・ノード間それぞれで並列処理を行う手法を提案する．使用可能な総コア数を超えないよう同時に実行するノード数に制限をかけつつも，可能なものから積極的に実行開始するというスケジューリングを行う．並列処理可能な分岐を持つ計算グラフが生成される深層学習モデルはResNetを始めとして複数あり，そのようなネットワークにおける推論・学習の高速化が期待される．実装はChainerを用いて行い，ノード単位の高速化も含めた総合的な性能向上に対する評価を複数のモデルに対して行う．

論文抄録(英)

内容記述タイプ

Other

内容記述

Many deep learning frameworks, including Chainer, train a neural network by processing a calculation graph with layers in the network as nodes and connections between them as edges sequentially. Performance hotspots of such frameworks are calculations in each node and the existing work has focused on speed-up them. However, nodes in widely used network models cannot efficiently utilize all available cores on a multi-core CPU because calculations in such nodes are too lightweight for such a CPU. As a result, an improvement in execution speed is relatively poor against an increase in the available number of cores. Therefore, we propose a method to parallelize several node executions in addition to an internal node parallelization by allocating executable several nodes to each core group consisting of multiple cores on the CPU. A scheduler positively tries to assign as many executable nodes as possible to each core group with a restriction of the whole number of cores. There are multiple deep learning models, such as ResNet, in which computation graphs with nodes that can be processed in parallel are generated. Then we expect there is a speeding up of inference and learning in such networks. The system is built on Chainer, and evaluations for overall performance improvement including speeding up on each node are conducted in several models.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464814

書誌情報

情報処理学会論文誌プログラミング（PRO）

巻 13, 号 1, p. 20-20, 発行日 2020-01-29

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7802

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 20:42:52.866419

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

深層学習フレームワークにおけるマルチコアCPU向け計算グラフスケジューリング

× 樋口, 兼一

× 田浦, 健次朗

× Tomokazu, Higuchi

× Kenjiro, Taura

Versions

Share

Cite as

エクスポート