Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/118830
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Data Science and Artificial Intelligence | en_US |
| dc.contributor | Department of Computing | en_US |
| dc.creator | Zhou, Y | en_US |
| dc.creator | Wu, X | en_US |
| dc.creator | Wu, J | en_US |
| dc.creator | Feng, L | en_US |
| dc.creator | Tan, KC | en_US |
| dc.date.accessioned | 2026-05-20T06:39:23Z | - |
| dc.date.available | 2026-05-20T06:39:23Z | - |
| dc.identifier.uri | http://hdl.handle.net/10397/118830 | - |
| dc.description | The Thirty-ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, USA, Dec 01 2025 | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | OpenReview.net | en_US |
| dc.rights | CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) | en_US |
| dc.rights | The following publication Zhou, Y., Wu, X., Wu, J., Feng, L., & Tan, K. C. (2026). Hm3: Hierarchical multi-objective model merging for pretrained models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems is available at https://openreview.net/forum?id=JeP0lpusYw. | en_US |
| dc.title | HM3 : hierarchical multi-objective model merging for pretrained models | en_US |
| dc.type | Conference Paper | en_US |
| dcterms.abstract | Model merging is a technique that combines multiple large pretrained models into a single model, enhancing performance and broadening task adaptability without original data or additional training. However, most existing model merging methods focus primarily on exploring the parameter space, merging models with identical architectures. Despite its potential, merging in the architecture space remains in its early stages due to the vast search space and challenges related to layer compatibility. This paper designs a hierarchical model merging framework named HM3, formulating a bilevel multi-objective model merging problem across both parameter and architecture spaces. At the parameter level, HM3 integrates existing merging methods to quickly identify optimal parameters. Based on these, an actor-critic strategy with efficient policy discretization is employed at the architecture level to explore inference paths with Markov property in the layer-granularity search space for reconstructing these optimal models. By training reusable policy and value networks, HM3 learns Pareto optimal models to provide customized solutions for various tasks. Experimental results on language and vision tasks demonstrate that HM3 outperforms methods focusing solely on the parameter or architecture space. | en_US |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | The Thirty-ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, USA, Dec 01 2025, https://openreview.net/forum?id=JeP0lpusYw | en_US |
| dcterms.issued | 2025 | - |
| dc.relation.conference | Neural Information Processing Systems [NeurIPS] | en_US |
| dc.description.validate | 202605 bcch | en_US |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | a4427a | - |
| dc.identifier.SubFormID | 52773 | - |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | This work was partially supported by National Natural Science Foundation of China under Grant U21A20512 and in part by the Research Grants Council of the Hong Kong SAR under Grant No. C5052-23G, Grant PolyU 15229824, Grant PolyU 15218622, and Grant PolyU 15215623. This work was also partially supported the Research Grants Council of the Hong Kong SAR (Grant No. PolyU15217424, PolyU25216423), and The Hong Kong Polytechnic University (Project IDs: P0043563). This work was also in part by the Natural Science Foundation of Chongqing (Innovation and Development Joint Fund) under Grant CSTB2025NSCO-LZX0014. | en_US |
| dc.description.pubStatus | Unpublish | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Conference Paper | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 29077_HM3_Hierarchical_Multi_O.pdf | 867.53 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


