Constrained human preference alignment for natural language planning with LLMs

Zhou, Y; Hong, H; Cheng, R; Tan, KC

doi:10.1109/MIND67540.2025.11351754

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/118831

DC Field	Value	Language
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.creator	Zhou, Y	en_US
dc.creator	Hong, H	en_US
dc.creator	Cheng, R	en_US
dc.creator	Tan, KC	en_US
dc.date.accessioned	2026-05-20T06:43:24Z	-
dc.date.available	2026-05-20T06:43:24Z	-
dc.identifier.isbn	979-8-3315-8768-0 (Compliant PDF Files)	en_US
dc.identifier.isbn	979-8-3315-8767-3 (Conference USB Version)	en_US
dc.identifier.isbn	979-8-3315-8769-7 (Print on Demand(PoD))	en_US
dc.identifier.uri	http://hdl.handle.net/10397/118831	-
dc.description	2025 International Conference on Machine Intelligence and Nature-Inspired Computing (MIND), 31 October - 2 November 2025, Xiamen, China	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers, Inc.	en_US
dc.rights	© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Y. Zhou, H. Hong, R. Cheng and K. C. Tan, "Constrained Human Preference Alignment for Natural Language Planning with LLMs," 2025 International Conference on Machine Intelligence and Nature-Inspired Computing (MIND), Xiamen, China, 2025, pp. 88-89 is available at https://doi.org/10.1109/MIND67540.2025.11351754.	en_US
dc.subject	Constraint	en_US
dc.subject	LLM	en_US
dc.subject	Planning	en_US
dc.subject	Preference alignment	en_US
dc.title	Constrained human preference alignment for natural language planning with LLMs	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	88	en_US
dc.identifier.epage	89	en_US
dc.identifier.doi	10.1109/MIND67540.2025.11351754	en_US
dcterms.abstract	Recent advances in large language models (LLMs) have established them as promising candidates for natural language planning tasks. However, existing approaches often fail to address two critical challenges: 1) the effective alignment of LLM-generated plans with human preferences, and 2) the dynamic enforcement of diverse constraints inherent in planning scenarios. To bridge these gaps, we propose a constraint-aware human-preference alignment framework for natural language planning. Our contributions are threefold. First, we design a process reward model that aligns LLM outputs with human preferences through step-by-step feedback, facilitating efficient and interpretable preference learning. Second, we develop a constraint-aware mechanism integrated into the rewriting strategy, which dynamically penalizes violations of task-specific constraints at each reasoning step. Third, we introduce a unified adaptive metric enabling a multifaceted assessment of planning quality. We validate our framework through experiments on planning benchmarks, demonstrating improvements in success rate with constraints and human preference alignment over baselines.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	In 2025 International Conference on Machine Intelligence and Nature-Inspired Computing (MIND), 31 October - 2 November 2025, Xiamen, China, p. 88-89	en_US
dcterms.issued	2025	-
dc.relation.ispartofbook	2025 International Conference on Machine Intelligence and Nature-Inspired Computing (MIND), 31 October - 2 November 2025, Xiamen, China	en_US
dc.relation.conference	Machine Intelligence and Nature-Inspired Computing [MIND]	en_US
dc.description.validate	202605 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a4427b	-
dc.identifier.SubFormID	52774	-
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported in part by National Natural Science Foundation of China (Grant No. U21A20512), Research Grants Council of the Hong Kong SAR (Grant No. C5052-23G, PolyU15229824, PolyU15218622, PolyU15215623), and The Hong Kong Polytechnic University (Project IDs: P0053758, P0051130, P0052694).	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Zhou_Constrained_Human_Preference.pdf	Pre-Published version	868.76 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM