Item type |
SIG Technical Reports(1) |
公開日 |
2018-02-21 |
タイトル |
|
|
タイトル |
Storage-Side Processing for Spark with Tiered Storage |
タイトル |
|
|
言語 |
en |
|
タイトル |
Storage-Side Processing for Spark with Tiered Storage |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
I/Oシステム |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
University of Tsukuba |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST)/University of Tsukuba |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST)/University of Tsukuba |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
University of Tsukuba |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) / University of Tsukuba |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) / University of Tsukuba |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者名 |
Kaihui, Zhang
Yusuke, Tanimura
Hidemoto, Nakada
Hirotaka, Ogawa
|
著者名(英) |
Kaihui, Zhang
Yusuke, Tanimura
Hidemoto, Nakada
Hirotaka, Ogawa
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Apache Spark is a parallel data processing framework that executes fast for iterative calculations and interactive processing, by caching intermediate data in memory with a lineage-based data recovery from faults. However, Spark still needs to load input data from a persistent storage at the beginning of main analytics and store the final result on the storage at the end of the analytics. In this study, we use a memory-based, tiered storage called Alluxio for the persistent storage of Spark and implement the active storage concept, which utilizes processing resources on the storage side and reduces the amount of I/O between Spark and Alluxio. As our first step, we implemented filtering on the Alluxio worker and examined performance improvement of reading data. The results showed that the performance was worse than we expected, due to inefficiency of storage-side filtering in our implementation. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Apache Spark is a parallel data processing framework that executes fast for iterative calculations and interactive processing, by caching intermediate data in memory with a lineage-based data recovery from faults. However, Spark still needs to load input data from a persistent storage at the beginning of main analytics and store the final result on the storage at the end of the analytics. In this study, we use a memory-based, tiered storage called Alluxio for the persistent storage of Spark and implement the active storage concept, which utilizes processing resources on the storage side and reduces the amount of I/O between Spark and Alluxio. As our first step, we implemented filtering on the Alluxio worker and examined performance improvement of reading data. The results showed that the performance was worse than we expected, due to inefficiency of storage-side filtering in our implementation. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10463942 |
書誌情報 |
研究報告ハイパフォーマンスコンピューティング(HPC)
巻 2018-HPC-163,
号 7,
p. 1-6,
発行日 2018-02-21
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8841 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |