Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114572
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorJiang, Zen_US
dc.creatorWu, Pen_US
dc.creatorLiang, Zen_US
dc.creatorChen, PQen_US
dc.creatorYuan, Xen_US
dc.creatorJia, Yen_US
dc.creatorTu, Jen_US
dc.creatorLi, Cen_US
dc.creatorNg, PHFen_US
dc.creatorLi, Qen_US
dc.date.accessioned2025-08-11T06:20:00Z-
dc.date.available2025-08-11T06:20:00Z-
dc.identifier.isbn979-8-4007-1454-2en_US
dc.identifier.urihttp://hdl.handle.net/10397/114572-
dc.language.isoenen_US
dc.publisherAssociation for Computing Machineryen_US
dc.rightsThis work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0).en_US
dc.rights© 2025 Copyright held by the owner/author(s).en_US
dc.rightsThe following publication Jiang, Z., Wu, P., Liang, Z., Chen, P. Q., Yuan, X., Jia, Y., Tu, J., Li, C., Ng, P. H. F., & Li, Q. (2025). HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto ON, Canada, 5505-5515 is available at https://doi.org/10.1145/3711896.3737378.en_US
dc.subjectBenchmarken_US
dc.subjectHierarchical reasoningen_US
dc.subjectLarge language modelsen_US
dc.subjectNatural language processingen_US
dc.titleHiBench : benchmarking LLMs capability on hierarchical structure reasoningen_US
dc.typeConference Paperen_US
dc.identifier.spage5505en_US
dc.identifier.epage5515en_US
dc.identifier.doi10.1145/3711896.3737378en_US
dcterms.abstractStructure reasoning is a fundamental capability of large language models (LLMs), enabling them to reason about structured commonsense and answer multi-hop questions. However, existing benchmarks for structure reasoning mainly focus on horizontal and coordinate structures (e.g. graphs), overlooking the hierarchical relationships within them. Hierarchical structure reasoning is crucial for human cognition, particularly in memory organization and problem-solving. It also plays a key role in various real-world tasks, such as information extraction and decision-making. To address this gap, we propose HiBench, the first framework designed to systematically benchmark the hierarchical reasoning capabilities of LLMs from initial structure generation to final proficiency assessment. It encompasses six representative scenarios, covering both fundamental and practical aspects, and consists of 30 tasks with varying hierarchical complexity, totaling 39,519 queries. To evaluate LLMs comprehensively, we develop five capability dimensions that depict different facets of hierarchical structure understanding. Through extensive evaluation of 20 LLMs from 10 model families, we reveal key insights into their capabilities and limitations: 1) existing LLMs show proficiency in basic hierarchical reasoning tasks; 2) they still struggle with more complex structures and implicit hierarchical representations, especially in structural modification and textual reasoning. Based on these findings, we create a small yet well-designed instruction dataset, which enhances LLMs' performance on HiBench by an average of 88.84% (Llama-3.1-8B) and 31.38% (Qwen2.5-7B) across all tasks. The HiBench dataset and toolkit are available at https://github.com/jzzzzh/HiBench to encourage evaluation.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationKDD '25: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, p. 5505-5515en_US
dcterms.issued2025-
dc.relation.ispartofbookKDD '25: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2en_US
dc.description.validate202508 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera3975-
dc.identifier.SubFormID51855-
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe research described in this paper has been partly supported by General Research Funds from the Hong Kong Research Grants Council (project no. PolyU 15207322, 15200023, 15206024, and 15224524), internal research funds from The Hong Kong Polytechnic University (project no. P0042693, P0048625, P0051361, P0052406, and P0052986).en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
3711896.3737378.pdf1.69 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.