High-performance scheduling of deep learning tasks in collaborative edge computing

Zhang, Mingjin

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/108617

Title:	High-performance scheduling of deep learning tasks in collaborative edge computing
Authors:	Zhang, Mingjin
Degree:	Ph.D.
Issue Date:	2024
Abstract:	In recent years, deep learning (DL) models and algorithms have been extensively used in various applications. Traditionally, DL tasks, including model training and inference, are usually performed on centralized cloud servers in data centers due to their powerful and abundant computing resources. However, the computation on the cloud usually suffers from high communication costs, long response latency, and privacy concerns. In this case, edge computing was proposed recently to migrate the computation and services from the remote cloud to the network edge on edge nodes, closer to the data sources. However, performing deep learning model training and inference tasks at the edge is challenging. While the deep learning models are usually computation-intensive and resource-greedy, the computation resources on edge nodes are constrained, which may not be able to burden the training and inference tasks. Besides, the data are usually on geo-distributed edge nodes, which are from different stakeholders and have heterogeneous networking and computation capabilities. Furthermore, deep learning tasks have inner characteristics. There are various model training paradigms. Many hyper-parameters, such as batch size, learning rate, and aggregation frequency, can affect the model performance. Also, many AI applications involve a set of dependent DL models, making it more complex. To address the above problems, this study aims to schedule the AI model training and inference tasks among heterogeneous edge devices and cloud servers to reduce latency while preserving accuracy by jointly considering the edge resources and the characteristics of the deep learning tasks. This thesis makes the following three contributions. First, design and develop ENTS, an edge-native task scheduling system runtime, to schedule the deep learning tasks among large-scale, geo-distributed, and heterogeneous edge nodes. While existing task scheduling systems for edge computing consider only computation resources, ENTS collaboratively schedules computation and networking resources while considering both the DL task profile and resource status. Second, schedule the model training tasks in edge computing to reduce overall training time. Existing distributed machine learning framework at edge suffers from the heterogeneous and constrained edge resources. We propose a novel federated learning framework that adaptively splits and schedules the training tasks among the heterogeneous edge nodes and the FL server for acceleration without compromising accuracy. Third, schedule the inference tasks among edge nodes to achieve low latency and high system throughput. While existing methods focus on the cloud-edge collaboration, and seldom consider the collaboration among edge nodes, we develop a collaborative edge intelligence platform to enable edge nodes to share the data and computation resources for performing latency-sensitive video analytics tasks. In summary, this thesis systematically investigates the requirements and solves the deep learning tasks scheduling problem for achieving high-performance model training and deployment in edge computing. The proposed framework and solutions address the challenging issues resulting from constrained and heterogeneous edge resources, and complexity of DNN model training and inference tasks. We also outline future directions, including decentralized scheduling framework for edge resources from multiple stakeholders and general programming models for efficient workload partition of deep learning tasks.
Subjects:	Edge computing Deep learning (Machine learning) Hong Kong Polytechnic University -- Dissertations
Pages:	xiv, 125 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/13080

Show full item record

Page views

88

Citations as of Nov 10, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM