Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115059
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorResearch Institute for Smart Ageing-
dc.creatorWang, J-
dc.creatorMa, R-
dc.creatorYang, X-
dc.creatorQi, Q-
dc.creatorZhuang, Z-
dc.creatorWang, J-
dc.creatorLiao, J-
dc.creatorGuo, S-
dc.date.accessioned2025-09-09T07:40:22Z-
dc.date.available2025-09-09T07:40:22Z-
dc.identifier.issn1544-3566-
dc.identifier.urihttp://hdl.handle.net/10397/115059-
dc.language.isoenen_US
dc.publisherAssociation for Computing Machineryen_US
dc.rightsThis work is licensed under a Creative Commons Attribution International 4.0 License (https://creativecommons.org/licenses/by/4.0/).en_US
dc.rights©2025 Copyright held by the owner/author(s).en_US
dc.rightsThe following publication Wang, J., Ma, R., Yang, X., Qi, Q., Zhuang, Z., Wang, J., Liao, J., & Guo, S. (2025). DeepZoning: Re-accelerate CNN Inference with Zoning Graph for Heterogeneous Edge Cluster. ACM Trans. Archit. Code Optim., 22(1), Article 10 is available at https://doi.org/10.1145/3701995.en_US
dc.subjectCooperative CNN inferenceen_US
dc.subjectEdge computingen_US
dc.subjectGraph partitionen_US
dc.subjectModel deploymenten_US
dc.titleDeepZoning : re-accelerate CNN inference with zoning graph for heterogeneous edge clusteren_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume22-
dc.identifier.issue1-
dc.identifier.doi10.1145/3701995-
dcterms.abstractParallelizing CNN inference on heterogeneous edge clusters with data parallelism has gained popularity as a way to meet real-time requirements without sacrificing model accuracy. However, existing algorithms struggle to find optimal parallel granularity for complex CNNS, the structure of which is a directed acyclic graph (DAG) rather than a chain, and the parallel dimension is inflexible. To distribute the workload of modern CNNs on heterogeneous devices is also proven as NP-hard problem. In this article, we introduce DeepZoning, a versatile and cooperative inference framework that combines both model and data parallelism to accelerate CNN inference. DeepZoning employs two algorithms at different levels: (1) a low-level Adaptive Workload Partition algorithm that uses linear programming and takes spatial and channel dimensions into optimization during the search for feature map distribution on heterogeneous devices, and (2) a high-level Model Partition algorithm that finds the optimal model granularity and organizes complex CNNs into sequential zones to balance communication and computation during execution. Our experimental evaluations show that DeepZoning is effective, achieving up to a 3.02× speed improvement on our experimental prototype compared to state-of-the-art algorithms.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationACM transactions on architecture and code optimization, Mar. 2025, v. 22, no. 1, 10-
dcterms.isPartOfACM transactions on architecture and code optimization-
dcterms.issued2025-03-
dc.identifier.scopus2-s2.0-105003630561-
dc.identifier.eissn1544-3973-
dc.identifier.artn10-
dc.description.validate202509 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Scopus/WOSen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis work was supported by the National Natural Science Foundation of China under Grants (U23B2001, 62171057, 62101064, 62201072, 62001054, 62071067), the Ministry of Education and China Mobile Joint Fund (MCM20200202, MCM20180101).en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
3701995.pdf4.37 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.