Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/96937
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Computing | en_US |
dc.creator | Li, Z | en_US |
dc.creator | Yiu, ML | en_US |
dc.creator | Chan, TN | en_US |
dc.date.accessioned | 2023-01-04T01:54:51Z | - |
dc.date.available | 2023-01-04T01:54:51Z | - |
dc.identifier.isbn | 978-1-6654-0883-7 (Electronic) | en_US |
dc.identifier.isbn | 978-1-6654-0884-4 (Print on Demand(PoD)) | en_US |
dc.identifier.uri | http://hdl.handle.net/10397/96937 | - |
dc.language.iso | en | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers | en_US |
dc.rights | © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | en_US |
dc.rights | The following publication Z. Li, M. L. Yiu and T. N. Chan, "PAW: Data Partitioning Meets Workload Variance," 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 123-135 is available at https://dx.doi.org/10.1109/ICDE53745.2022.00014. | en_US |
dc.subject | Costs | en_US |
dc.subject | Shape | en_US |
dc.subject | Conferences | en_US |
dc.subject | Data engineering | en_US |
dc.subject | Data models | en_US |
dc.subject | Space exploration | en_US |
dc.subject | Partitioning algorithms | en_US |
dc.title | PAW : data partitioning meets workload variance | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 123 | en_US |
dc.identifier.epage | 135 | en_US |
dc.identifier.doi | 10.1109/ICDE53745.2022.00014 | en_US |
dcterms.abstract | In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Never-theless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods. To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70x more efficient than the state-of-the-art method. | en_US |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | 2022 IEEE 38th International Conference on Data Engineering (ICDE), 09-12 May 2022, Kuala Lumpur, Malaysia, p. 123-135 | en_US |
dcterms.issued | 2022 | - |
dc.relation.conference | IEEE International Conference on Data Engineering [ICDE] | en_US |
dc.description.validate | 202210 bcch | en_US |
dc.description.oa | Author’s Original | en_US |
dc.identifier.FolderNumber | a1614 | - |
dc.identifier.SubFormID | 45620 | - |
dc.description.fundingSource | RGC | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | Green (AO) | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Li_PAW_Data_Partitioning.pdf | Preprint version | 1 MB | Adobe PDF | View/Open |
Page views
108
Citations as of Mar 2, 2025
Downloads
95
Citations as of Mar 2, 2025
SCOPUSTM
Citations
2
Citations as of Jun 21, 2024
WEB OF SCIENCETM
Citations
4
Citations as of Mar 27, 2025

Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.