Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/96937
Title: | PAW : data partitioning meets workload variance | Authors: | Li, Z Yiu, ML Chan, TN |
Issue Date: | 2022 | Source: | 2022 IEEE 38th International Conference on Data Engineering (ICDE), 09-12 May 2022, Kuala Lumpur, Malaysia, p. 123-135 | Abstract: | In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Never-theless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods. To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70x more efficient than the state-of-the-art method. | Keywords: | Costs Shape Conferences Data engineering Data models Space exploration Partitioning algorithms |
Publisher: | Institute of Electrical and Electronics Engineers | ISBN: | 978-1-6654-0883-7 (Electronic) 978-1-6654-0884-4 (Print on Demand(PoD)) |
DOI: | 10.1109/ICDE53745.2022.00014 | Rights: | © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication Z. Li, M. L. Yiu and T. N. Chan, "PAW: Data Partitioning Meets Workload Variance," 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 123-135 is available at https://dx.doi.org/10.1109/ICDE53745.2022.00014. |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Li_PAW_Data_Partitioning.pdf | Preprint version | 1 MB | Adobe PDF | View/Open |
Page views
108
Citations as of Mar 2, 2025
Downloads
95
Citations as of Mar 2, 2025
SCOPUSTM
Citations
2
Citations as of Jun 21, 2024
WEB OF SCIENCETM
Citations
4
Citations as of Mar 20, 2025

Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.