Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/96937
PIRA download icon_1.1View/Download Full Text
Title: PAW : data partitioning meets workload variance
Authors: Li, Z 
Yiu, ML 
Chan, TN
Issue Date: 2022
Source: 2022 IEEE 38th International Conference on Data Engineering (ICDE), 09-12 May 2022, Kuala Lumpur, Malaysia, p. 123-135
Abstract: In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Never-theless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods. To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70x more efficient than the state-of-the-art method.
Keywords: Costs
Shape
Conferences
Data engineering
Data models
Space exploration
Partitioning algorithms
Publisher: Institute of Electrical and Electronics Engineers
ISBN: 978-1-6654-0883-7 (Electronic)
978-1-6654-0884-4 (Print on Demand(PoD))
DOI: 10.1109/ICDE53745.2022.00014
Rights: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
The following publication Z. Li, M. L. Yiu and T. N. Chan, "PAW: Data Partitioning Meets Workload Variance," 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 123-135 is available at https://dx.doi.org/10.1109/ICDE53745.2022.00014.
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Li_PAW_Data_Partitioning.pdfPreprint version1 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Author’s Original
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

108
Citations as of Mar 2, 2025

Downloads

95
Citations as of Mar 2, 2025

SCOPUSTM   
Citations

2
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

4
Citations as of Mar 20, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.