Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115657
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorMeng, Sen_US
dc.creatorWang, Yen_US
dc.creatorCui, Yen_US
dc.creatorChau, LPen_US
dc.date.accessioned2025-10-16T01:53:59Z-
dc.date.available2025-10-16T01:53:59Z-
dc.identifier.issn0950-7051en_US
dc.identifier.urihttp://hdl.handle.net/10397/115657-
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.subjectBehavior decisionen_US
dc.subjectMulti-tasken_US
dc.subjectSegment-anything modelen_US
dc.titleFoundation model-assisted interpretable vehicle behavior decision makingen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume324en_US
dc.identifier.doi10.1016/j.knosys.2025.113868en_US
dcterms.abstractIntelligent autonomous driving systems must achieve accurate perception and driving decisions to enhance their effectiveness and adoption. Currently, driving behavior decisions have achieved high performance thanks to deep learning technology. However, most existing approaches lack interpretability, reducing user trust and hindering widespread adoption. While some efforts focus on transparency through strategies like heat maps, cost-volume, and auxiliary tasks, they often provide limited model interpretation or require additional annotations. In this paper, we present a novel unified framework to tackle these issues by integrating ego-vehicle behavior decisions with human-centric language-based interpretation prediction from ego-view visual input. First, we propose a self-supervised class-agnostic object Segmentor module based on Segment Anything Model and 2-D light adapter strategy, to capture the overall surrounding cues without any extra segmentation mask labels. Second, the semantic extractor is adopted to generate the hierarchical semantic-level cues. Subsequently, a fusion module is designed to generate the refined global features by incorporating the class-agnostic object features and semantic-level features using a self-attention mechanism. Finally, vehicle behavior decisions and possible human-centric interpretations are jointly generated based on the global fusion context. The experimental results across various settings on the public datasets demonstrate the superiority and effectiveness of our proposed solution.en_US
dcterms.accessRightsembargoed accessen_US
dcterms.bibliographicCitationKnowledge-based systems, 3 Aug. 2025, v. 324, 113868en_US
dcterms.isPartOfKnowledge-based systemsen_US
dcterms.issued2025-08-03-
dc.identifier.scopus2-s2.0-105008112195-
dc.identifier.artn113868en_US
dc.description.validate202510 bcelen_US
dc.description.oaNot applicableen_US
dc.identifier.SubFormIDG000232/2025-07-
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust. And it was partially supported by the Research Grants Council of the Hong Kong SAR, China (Project No. PolyU 15215824).en_US
dc.description.pubStatusPublisheden_US
dc.date.embargo2027-08-03en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Open Access Information
Status embargoed access
Embargo End Date 2027-08-03
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.