集約処理を用いたMapReduce最適化手法の提案と実装

小沢, 健史; 鬼塚, 真; 福本, 佳史; 盛合, 敏; Tsuyoshi, Ozawa; Makoto, Onizuka; Yoshifumi, Fukumoto; Satoshi, Moriai

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

集約処理を用いたMapReduce最適化手法の提案と実装

https://ipsj.ixsq.nii.ac.jp/records/87533

名前 / ファイル	ライセンス	アクション
IPSJ-ComSys2012011.pdf (1.2 MB)	Copyright (c) 2012 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2012-11-29

タイトル

集約処理を用いたMapReduce最適化手法の提案と実装

タイトル

言語

タイトル

MapReduce optimization using mapper-side aggregation

言語

jpn

キーワード

主題Scheme

Other

主題

大規模データ処理

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

NTTソフトウェアイノベーションセンタ

著者所属

NTTソフトウェアイノベーションセンタ

著者所属

NTTソフトウェアイノベーションセンタ

著者所属

NTTソフトウェアイノベーションセンタ

著者所属(英)

NTT Software Innovation Center

著者所属(英)

NTT Software Innovation Center

著者所属(英)

NTT Software Innovation Center

著者所属(英)

NTT Software Innovation Center

著者名

小沢, 健史

著者名(英)

Tsuyoshi, Ozawa

論文抄録

内容記述タイプ

Other

内容記述

本稿では， MapReduce で行う処理のうち，部分集約が可能な処理を高速化する手法を示す．部分集約が可能な処理とは，集約時に結合法則と交換法則が成立する処理のことを指す．部分集約ができる処理に対して，既存研究では特有の処理系を新たに作成することにより高速化を行っていた．しかし，これらの手法は MapReduce の仕組みを大幅に変更する必要があることから， Hadoop に組み込むのは困難であった．そこで本研究では， Hadoop への実装コストが低く抑え，高速化をおこなう Map Multi-Reduce の提案を行う． Map Multi-Reduce は， MapReduce に Record Reduce と Local Reduce の 2 つの機能を追加した， MapReduce の拡張版である．提案手法の実装を行うにあたり行った Hadoop への変更量は， Record Reduce で約 200 行， LocalReduce で約 300 行と小さい．このように少ない変更量にも関わらず，ディスク IO とネットワーク IO が削減され，実験により 2TB WordCount を行う際に，処理速度が 1.7 倍になることを確認した．また， 100GB のデータに対して WordCount を行った際に，最大で Map 処理と Reduce 処理間のデータの受け渡しを 50% に削減できることを確認し，より大きな入力データに対して，データの受け渡しコストをより削減できる可能性があることを示す．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we propose a MapReduce optimization by using mapper-side aggregation designed for aggregation queries - the queries consisting of the aggregation operations that satisfy both the commutative and the associative law. The mapper-side aggregation has been applied in different platforms, however, it is difficult for related work to be embedded within existing MapReduce framework like Hadoop, because its mechanism of task scheduling or monitoring is different and MapReduce framework does not provide inter-process communication facilitiy. To solve this problem, we prototype Map Multi-Reduce, while preserving MapReduce semantics with small modification against Hadoop. Map Multi-Reduce is an extension of MapReduce consisting of two features - Record Reduce and Local Reduce. Record Reduce aggregates the intermediate data of MapTask in memory and is implemented in only 200 LOC. Local Reduce aggregates the the outputs of multiple MapTasks in same machines and is implemented in only 300 LOC. Map Multi-Reduce improves 1.7 times faster in Word-Count processing against 2TB dataset and cuts down shuffle cost by 50% against 100GB dataset.

書誌情報

コンピュータシステム・シンポジウム論文集

巻 2012, p. 60-69, 発行日 2012-11-29

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 17:09:21.421373

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

集約処理を用いたMapReduce最適化手法の提案と実装

× 小沢, 健史

× Tsuyoshi, Ozawa

Versions

Share

Cite as

エクスポート