PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction

Chan, LW; Liu, Y; Chan, T; Law, HK; Wong, SC; Yeung, AP; Lo, K; Yeung, S; Kwok, K; Chan, WY; Lau, TY; Shyu, CR

doi:10.1186/s12911-015-0166-2

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/7532

Title:	PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction
Authors:	Chan, LW Liu, Y Chan, T Law, HK Wong, SC Yeung, AP Lo, K Yeung, S Kwok, K Chan, WY Lau, TY Shyu, CR
Issue Date:	2015
Source:	BMC medical informatics and decision making, 2015, v. 15, 43, p. 1-8
Abstract:	Background: Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue. Methods: We collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis. Results: The Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms "Dysplastic nodule", "nodule of liver" and "equal density (isodense) lesion" were found the top three image findings associated with HCC in PubMed. Conclusions: Our findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.
Publisher:	BioMed Central Ltd.
Journal:	BMC medical informatics and decision making
EISSN:	1472-6947
DOI:	10.1186/s12911-015-0166-2
Rights:	© 2015 Chan et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public DomainDedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,unless otherwise stated. The following publication Chan, L. W., Liu, Y., Chan, T., Law, H. K., Wong, S. C., Yeung, A. P., … Shyu, C. R. (2015). PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction. BMC Medical Informatics and Decision Making, 15, 43, 1-8 is available at https://dx.doi.org/10.1186/s12911-015-0166-2
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Chan_PubMed-supported_Clinical_Term.pdf		1.1 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Page views

277

Last Week
13

Last month

Citations as of Feb 9, 2026

Downloads

186

Citations as of Feb 9, 2026

SCOPUS^TM
Citations

10

Last Week
0

Last month
0

Citations as of May 8, 2026

WEB OF SCIENCE^TM
Citations

7

Last Week
0

Last month
0

Citations as of Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM