Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119085
PIRA download icon_1.1View/Download Full Text
Title: Explainable molecular property prediction : aligning chemical concepts with predictions via language models
Authors: Wang, Z 
Lin, Z 
Lin, W 
Yang, M 
Zeng, M
Tan, KC 
Issue Date: Jun-2026
Source: IEEE transactions on pattern analysis and machine intelligence, June 2026, v. 48, no. 6, p. 7017-7031
Abstract: Providing explainable molecular property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We take a string-based molecular representation — Group SELFIES — as input tokens to pre-train and fine-tune our Lamole, as it provides chemically meaningful semantics. By disentangling the information flows of Lamole, we propose considering both self-attention weights and gradients for better quantification of each chemically meaningful substructure’s impact on the model’s output. To make the explanations more faithful to the structureproperty relationship, we then carefully craft a marginal loss to explicitly optimize the explanations to align with the chemists’ annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over eight datasets demonstrate Lamole can achieve comparable prediction accuracy and boost the explanation accuracy by up to 14.3%, being the state-of-the-art in explainable molecular property prediction. To further illustrate the actionable utility of the explanations derived from Lamole, we integrated the framework with an evolutionary algorithm. This integration established an interpretable optimization pipeline for molecular editing, demonstrating that Lamole functions beyond simple post-hoc analysis but serves as a practical guide for molecule discovery.
Keywords: Explainability
Language models
Molecular property prediction
Publisher: Institute of Electrical and Electronics Engineers
Journal: IEEE transactions on pattern analysis and machine intelligence 
ISSN: 0162-8828
EISSN: 1939-3539
DOI: 10.1109/TPAMI.2026.3664098
Rights: © 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
The following publication Z. Wang, Z. Lin, W. Lin, M. Yang, M. Zeng and K. C. Tan, 'Explainable Molecular Property Prediction: Aligning Chemical Concepts With Predictions via Language Models,' in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 6, pp. 7017-7031, June 2026 is available at https://doi.org/10.1109/TPAMI.2026.3664098.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Wang_Explainable_Molecular_Property.pdfPre-Published version14.45 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.