Explainable molecular property prediction : aligning chemical concepts with predictions via language models

Wang, Z; Lin, Z; Lin, W; Yang, M; Zeng, M; Tan, KC

doi:10.1109/TPAMI.2026.3664098

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119085

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.contributor	Department of Applied Physics	en_US
dc.creator	Wang, Z	en_US
dc.creator	Lin, Z	en_US
dc.creator	Lin, W	en_US
dc.creator	Yang, M	en_US
dc.creator	Zeng, M	en_US
dc.creator	Tan, KC	en_US
dc.date.accessioned	2026-06-02T02:28:53Z	-
dc.date.available	2026-06-02T02:28:53Z	-
dc.identifier.issn	0162-8828	en_US
dc.identifier.uri	http://hdl.handle.net/10397/119085	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Z. Wang, Z. Lin, W. Lin, M. Yang, M. Zeng and K. C. Tan, 'Explainable Molecular Property Prediction: Aligning Chemical Concepts With Predictions via Language Models,' in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 6, pp. 7017-7031, June 2026 is available at https://doi.org/10.1109/TPAMI.2026.3664098.	en_US
dc.subject	Explainability	en_US
dc.subject	Language models	en_US
dc.subject	Molecular property prediction	en_US
dc.title	Explainable molecular property prediction : aligning chemical concepts with predictions via language models	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	7017	en_US
dc.identifier.epage	7031	en_US
dc.identifier.volume	48	en_US
dc.identifier.issue	6	en_US
dc.identifier.doi	10.1109/TPAMI.2026.3664098	en_US
dcterms.abstract	Providing explainable molecular property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We take a string-based molecular representation — Group SELFIES — as input tokens to pre-train and fine-tune our Lamole, as it provides chemically meaningful semantics. By disentangling the information flows of Lamole, we propose considering both self-attention weights and gradients for better quantification of each chemically meaningful substructure’s impact on the model’s output. To make the explanations more faithful to the structureproperty relationship, we then carefully craft a marginal loss to explicitly optimize the explanations to align with the chemists’ annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over eight datasets demonstrate Lamole can achieve comparable prediction accuracy and boost the explanation accuracy by up to 14.3%, being the state-of-the-art in explainable molecular property prediction. To further illustrate the actionable utility of the explanations derived from Lamole, we integrated the framework with an evolutionary algorithm. This integration established an interpretable optimization pipeline for molecular editing, demonstrating that Lamole functions beyond simple post-hoc analysis but serves as a practical guide for molecule discovery.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE transactions on pattern analysis and machine intelligence, June 2026, v. 48, no. 6, p. 7017-7031	en_US
dcterms.isPartOf	IEEE transactions on pattern analysis and machine intelligence	en_US
dcterms.issued	2026-06	-
dc.identifier.scopus	2-s2.0-105030261082	-
dc.identifier.eissn	1939-3539	en_US
dc.description.validate	202606 bcjz	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.SubFormID	G001720/2026-04	-
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported in part by the Research Grants Council of the Hong Kong SAR under Grant C5052-23G, Grant 15208725 and Grant 15208222, in part by the Hong Kong Polytechnic University under Grant A0046682 and Grant P0057774, in part by the Fundamental Research Funds for the Central Universities under Grant 20720250164, and in part by the Xiamen Natural Science Foundation under Grant 3502Z202571027.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Wang_Explainable_Molecular_Property.pdf	Pre-Published version	14.45 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM