Storage-Side Processing for Spark with Tiered Storage

Kaihui, Zhang; Yusuke, Tanimura; Hidemoto, Nakada; Hirotaka, Ogawa; Kaihui, Zhang; Yusuke, Tanimura; Hidemoto, Nakada; Hirotaka, Ogawa

WEKO3

インデックスツリー

RootNode

アイテム

Storage-Side Processing for Spark with Tiered Storage

https://ipsj.ixsq.nii.ac.jp/records/186036

名前 / ファイル	ライセンス	アクション
IPSJ-HPC18163007.pdf (2.1 MB)	Copyright (c) 2018 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2018-02-21

タイトル

Storage-Side Processing for Spark with Tiered Storage

タイトル

言語

タイトル

Storage-Side Processing for Spark with Tiered Storage

言語

eng

キーワード

主題Scheme

Other

主題

I/Oシステム

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

University of Tsukuba

著者所属

National Institute of Advanced Industrial Science and Technology (AIST)／University of Tsukuba

著者所属

National Institute of Advanced Industrial Science and Technology (AIST)／University of Tsukuba

著者所属

National Institute of Advanced Industrial Science and Technology (AIST)

著者所属(英)

University of Tsukuba

著者所属(英)

National Institute of Advanced Industrial Science and Technology (AIST) / University of Tsukuba

著者所属(英)

National Institute of Advanced Industrial Science and Technology (AIST) / University of Tsukuba

著者所属(英)

National Institute of Advanced Industrial Science and Technology (AIST)

著者名

Kaihui, Zhang
Yusuke, Tanimura
Hidemoto, Nakada
Hirotaka, Ogawa

著者名(英)

Kaihui, Zhang
Yusuke, Tanimura
Hidemoto, Nakada
Hirotaka, Ogawa

論文抄録

内容記述タイプ

Other

内容記述

Apache Spark is a parallel data processing framework that executes fast for iterative calculations and interactive processing, by caching intermediate data in memory with a lineage-based data recovery from faults. However, Spark still needs to load input data from a persistent storage at the beginning of main analytics and store the final result on the storage at the end of the analytics. In this study, we use a memory-based, tiered storage called Alluxio for the persistent storage of Spark and implement the active storage concept, which utilizes processing resources on the storage side and reduces the amount of I/O between Spark and Alluxio. As our first step, we implemented filtering on the Alluxio worker and examined performance improvement of reading data. The results showed that the performance was worse than we expected, due to inefficiency of storage-side filtering in our implementation.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10463942

書誌情報

研究報告ハイパフォーマンスコンピューティング（HPC）

巻 2018-HPC-163, 号 7, p. 1-6, 発行日 2018-02-21

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8841

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 02:43:59.160432

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Storage-Side Processing for Spark with Tiered Storage

× Kaihui, Zhang

× Yusuke, Tanimura

× Hidemoto, Nakada

× Hirotaka, Ogawa

× Kaihui, Zhang

× Yusuke, Tanimura

× Hidemoto, Nakada

× Hirotaka, Ogawa

Versions

Share

Cite as

エクスポート