Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/96937
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorLi, Zen_US
dc.creatorYiu, MLen_US
dc.creatorChan, TNen_US
dc.date.accessioned2023-01-04T01:54:51Z-
dc.date.available2023-01-04T01:54:51Z-
dc.identifier.isbn978-1-6654-0883-7 (Electronic)en_US
dc.identifier.isbn978-1-6654-0884-4 (Print on Demand(PoD))en_US
dc.identifier.urihttp://hdl.handle.net/10397/96937-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication Z. Li, M. L. Yiu and T. N. Chan, "PAW: Data Partitioning Meets Workload Variance," 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 123-135 is available at https://dx.doi.org/10.1109/ICDE53745.2022.00014.en_US
dc.subjectCostsen_US
dc.subjectShapeen_US
dc.subjectConferencesen_US
dc.subjectData engineeringen_US
dc.subjectData modelsen_US
dc.subjectSpace explorationen_US
dc.subjectPartitioning algorithmsen_US
dc.titlePAW : data partitioning meets workload varianceen_US
dc.typeConference Paperen_US
dc.identifier.spage123en_US
dc.identifier.epage135en_US
dc.identifier.doi10.1109/ICDE53745.2022.00014en_US
dcterms.abstractIn distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Never-theless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods. To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70x more efficient than the state-of-the-art method.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitation2022 IEEE 38th International Conference on Data Engineering (ICDE), 09-12 May 2022, Kuala Lumpur, Malaysia, p. 123-135en_US
dcterms.issued2022-
dc.relation.conferenceIEEE International Conference on Data Engineering [ICDE]en_US
dc.description.validate202210 bcchen_US
dc.description.oaAuthor’s Originalen_US
dc.identifier.FolderNumbera1614-
dc.identifier.SubFormID45620-
dc.description.fundingSourceRGCen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AO)en_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
Li_PAW_Data_Partitioning.pdfPreprint version1 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Author’s Original
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

108
Citations as of Mar 2, 2025

Downloads

95
Citations as of Mar 2, 2025

SCOPUSTM   
Citations

2
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

4
Citations as of Mar 27, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.