PAW : data partitioning meets workload variance

Li, Z; Yiu, ML; Chan, TN

doi:10.1109/ICDE53745.2022.00014

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/96937

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.creator	Li, Z	en_US
dc.creator	Yiu, ML	en_US
dc.creator	Chan, TN	en_US
dc.date.accessioned	2023-01-04T01:54:51Z	-
dc.date.available	2023-01-04T01:54:51Z	-
dc.identifier.isbn	978-1-6654-0883-7 (Electronic)	en_US
dc.identifier.isbn	978-1-6654-0884-4 (Print on Demand(PoD))	en_US
dc.identifier.uri	http://hdl.handle.net/10397/96937	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Z. Li, M. L. Yiu and T. N. Chan, "PAW: Data Partitioning Meets Workload Variance," 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 123-135 is available at https://dx.doi.org/10.1109/ICDE53745.2022.00014.	en_US
dc.subject	Costs	en_US
dc.subject	Shape	en_US
dc.subject	Conferences	en_US
dc.subject	Data engineering	en_US
dc.subject	Data models	en_US
dc.subject	Space exploration	en_US
dc.subject	Partitioning algorithms	en_US
dc.title	PAW : data partitioning meets workload variance	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	123	en_US
dc.identifier.epage	135	en_US
dc.identifier.doi	10.1109/ICDE53745.2022.00014	en_US
dcterms.abstract	In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Never-theless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods. To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70x more efficient than the state-of-the-art method.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	2022 IEEE 38th International Conference on Data Engineering (ICDE), 09-12 May 2022, Kuala Lumpur, Malaysia, p. 123-135	en_US
dcterms.issued	2022	-
dc.relation.conference	IEEE International Conference on Data Engineering [ICDE]	en_US
dc.description.validate	202210 bcch	en_US
dc.description.oa	Author’s Original	en_US
dc.identifier.FolderNumber	a1614	-
dc.identifier.SubFormID	45620	-
dc.description.fundingSource	RGC	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AO)	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Li_PAW_Data_Partitioning.pdf	Preprint version	1 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Author’s Original

Access

View full-text via PolyU eLinks

Show simple item record

Page views

162

Last Week
16

Last month

Citations as of Nov 9, 2025

Downloads

133

Citations as of Nov 9, 2025

SCOPUS^TM
Citations

2

Citations as of Jun 21, 2024

WEB OF SCIENCE^TM
Citations

5

Citations as of Dec 18, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM