Pandasデータ解析ライブラリで記述された機械学習前処理の性能最適化に関する検討

仲池, 卓也; 川人, 基弘; 小原, 盛幹; Takuya, Nakaike; Motohiro, Kawahito; Moriyoshi, Ohara

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Pandasデータ解析ライブラリで記述された機械学習前処理の性能最適化に関する検討

https://ipsj.ixsq.nii.ac.jp/records/211070

名前 / ファイル	ライセンス	アクション
IPSJ-TPRO1402011.pdf (98.9 kB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2021-05-12

タイトル

Pandasデータ解析ライブラリで記述された機械学習前処理の性能最適化に関する検討

タイトル

言語

タイトル

Performance Optimizations of Machine Learning Pre-Processing Written in Pandas Data Anlytics Library

言語

jpn

キーワード

主題Scheme

Other

主題

[発表概要, Unrefereed Presentatin Abstract]

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

日本アイ・ビー・エム株式会社東京基礎研究所

著者所属

日本アイ・ビー・エム株式会社東京基礎研究所

著者所属

日本アイ・ビー・エム株式会社東京基礎研究所

著者所属(英)

IBM Research - Tokyo

著者所属(英)

IBM Research - Tokyo

著者所属(英)

IBM Research - Tokyo

著者名

仲池, 卓也
川人, 基弘
小原, 盛幹

著者名(英)

Takuya, Nakaike
Motohiro, Kawahito
Moriyoshi, Ohara

論文抄録

内容記述タイプ

Other

内容記述

従来，機械学習においては，ロジスティック回帰分析等のモデルの実行性能が重要視され，GPU等のハードウェアアクセラレータにより最適化されてきた．しかしながら，モデルの推論精度を向上させるためには，特徴量エンジニアリングを含めたデータの前処理が重要であり，それらの前処理の実行性能は十分に最適化されていない．本発表では，Pandasデータ解析ライブラリで記述された機械学習前処理の性能を最適化する手法を提案する．Pandasは，Pythonで記述されたデータ解析ライブラリであり，その利便性のため，多くのデータサイエンティストに利用されている．しかしながら，すべてのライブラリがPythonで実装されているため，高い性能を求めることが難しい．我々の提案手法は，Pandasで記述された機械学習前処理をONNX形式に変換し，高速な機械学習フレームワークを利用することにより性能向上を目指す．本発表では，我々が実装中のPandasからONNXの変換ツールの概要，およびPandasで記述された前処理とONNXランタイム上の前処理の性能比較について報告を行う．

論文抄録(英)

内容記述タイプ

Other

内容記述

In machine learning, researchers and developers have been optimizing the performance of machine-learning models such as loggistic regression by using hardware accelerators such as GPU. However, data pre-processing was not the main forcus of the performance optimization even though it is very important to improve the inferencing accuracy of machine-learning models. This presentation proposes a method to optimize the performance of the data pre-processing code witten in Pandas which is a data analytics library. Pandas has been widely used by many data scientists due to its useful data anlytics APIs. However, Pandas is not so fast because it is written in Python which has type checking overhead and serializes the execution. Our proposed method aims to improve the performance of data pre-processing by converting the data pre-processing code written in Pandas into an ONNX graph, which is a standard formant to represent machine-learning models, and then running the graph on other high-performance machine learning platforms such as Tensorflow. This presentation overviews our tool to covert the Pandas code into an ONNX graph, and then show how the performance of data pre-processing is improved.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464814

書誌情報

情報処理学会論文誌プログラミング（PRO）

巻 14, 号 2, p. 31-31, 発行日 2021-05-12

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7802

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 17:55:16.108384

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Pandasデータ解析ライブラリで記述された機械学習前処理の性能最適化に関する検討

× 仲池, 卓也

× 川人, 基弘

× 小原, 盛幹

× Takuya, Nakaike

× Motohiro, Kawahito

× Moriyoshi, Ohara

Versions

Share

Cite as

エクスポート