Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/21859
Title: Online load balancing for MapReduce with skewed data input
Authors: Le, Y
Liu, J
Ergun, F
Wang, D 
Keywords: Distributed processing
Resource allocation
Issue Date: 2014
Publisher: IEEE
Source: IEEE INFOCOM 2014 : IEEE Conference on Computer Communications, April 27-May 2, 2014, Toronto, ON, Canada, p. 2004-2012 How to cite?
Abstract: MapReduce has emerged as a powerful tool for distributed and scalable processing of voluminous data. In this paper, we, for the first time, examine the problem of accommodating data skew in MapReduce with online operations. Different from earlier heuristics in the very late reduce stage or after seeing all the data, we address the skew from the beginning of data input, and make no assumption about a priori knowledge of the data distribution nor require synchronized operations. We examine the input in a continuous fashion and adaptively assign tasks with a load-balanced strategy. We show that the optimal strategy is a constrained version of online minimum makespan and, in the MapReduce context where pairs with identical keys must be scheduled to the same machine, there is an online algorithm with a provable 2-competitive ratio. We further suggest a sample-based enhancement, which, probabilistically, achieves a 3/2-competitive ratio with a bounded error.
URI: http://hdl.handle.net/10397/21859
ISBN: 
DOI: 10.1109/INFOCOM.2014.6848141
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

23
Last Week
0
Last month
Citations as of May 15, 2018

Page view(s)

47
Last Week
0
Last month
Citations as of May 20, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.