Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/116822
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Computing | - |
| dc.creator | Zhu, S | - |
| dc.creator | Wang, D | - |
| dc.date.accessioned | 2026-01-21T03:52:58Z | - |
| dc.date.available | 2026-01-21T03:52:58Z | - |
| dc.identifier.isbn | 979-8-4007-1125-1 | - |
| dc.identifier.uri | http://hdl.handle.net/10397/116822 | - |
| dc.description | 16th ACM International Conference on Future and Sustainable Energy Systems, Rotterdam, Netherlands, June 17-20, 2025 | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | The Association for Computing Machinery | en_US |
| dc.rights | This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0). | en_US |
| dc.rights | ©2025 Copyright held by the owner/author(s). | en_US |
| dc.rights | The following publication Zhu, S., & Wang, D. (2025). Energy-efficient LLM Training in GPU datacenters with Immersion Cooling Systems Proceedings of the 16th ACM International Conference on Future and Sustainable Energy Systems is available at https://doi.org/10.1145/3679240.3734609. | en_US |
| dc.subject | Immersion Cooling | en_US |
| dc.subject | LLM Training | en_US |
| dc.subject | Thermal Control | en_US |
| dc.title | Energy-efficient LLM training in GPU datacenters with immersion cooling systems | en_US |
| dc.type | Conference Paper | en_US |
| dc.identifier.spage | 407 | - |
| dc.identifier.epage | 414 | - |
| dc.identifier.doi | 10.1145/3679240.3734609 | - |
| dcterms.abstract | With the increase in AI applications, the energy consumption of datacenters that run AI jobs is greatly increasing. The overall energy consumption of a datacenter is closely linked with that of its cooling system. Recently, there has been a revolution in immersion cooling technologies, in which servers can be directly immersed in dielectric cooling liquid (coolant). However, there is a lack of understanding of how the performance of AI jobs is affected by immersion cooling systems. While the physics behind immersion cooling is understood, in this paper we observe key restricting factors: (1) the boiling state of the coolant and (2) the heat removal rate of the coolant may not match the heat generation rate of the GPUs, triggering the thermal-throttle mechanisms of the GPUs. In this paper, we study the energy-efficient and delay-ensured computing of large language model (LLM) training jobs over a cluster of GPUs in immersion cooling systems. We model the thermal characteristics of the system (e.g., heat generation, heat removal, and temperature) and develop an algorithm with workload assignment and frequency scaling to avoid the delay incurred by the thermal-throttle mechanisms and to execute the workloads in energy-efficient frequencies. In our evaluation, we simulate the computational fluid dynamics (CFD) of the immersion cooling systems through the Ansys Fluent software. We show that we outperform baseline algorithms by up to 53.2% in energy and 22.5% in delays. | - |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | In E-ENERGY '25: Proceedings of the 2025 the 16th ACM International Conference on Future and Sustainable Energy Systems, p. 407-414. New York, New York: The Association for Computing Machinery, 2025 | - |
| dcterms.issued | 2025 | - |
| dc.identifier.scopus | 2-s2.0-105016379923 | - |
| dc.relation.ispartofbook | E-ENERGY '25: Proceedings of the 2025 the 16th ACM International Conference on Future and Sustainable Energy Systems | - |
| dc.publisher.place | New York, New York | en_US |
| dc.description.validate | 202601 bcch | - |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | OA_Scopus/WOS | en_US |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | Dan Wang\u2019s work is supported in part by RGC GRF 15200321, 15201322, 15230624, ITC ITF-ITS/056/22MX, ITS/052/23MX, and PolyU 1-CDKK, G-SAC8. | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Conference Paper | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 3679240.3734609.pdf | 3.15 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



