Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/13900
Title: Hierarchical content classification and script determination for automatic document image processing
Authors: Chi, Z 
Wang, Q
Siu, WC 
Keywords: Background thinning
Content classification
Cross-correlation
Document image processing
Kolmogorov complexity
Neural networks
Page segmentation
Script determination
Issue Date: 2003
Publisher: Elsevier
Source: Pattern recognition, 2003, v. 36, no. 11, p. 2483-2500 How to cite?
Journal: Pattern recognition 
Abstract: Page segmentation and image content classification play an important role in automatic image processing with applications to mixed-type document image compression, form and check reading, and automatic mail sorting. In this paper, we first present an enhanced background thinning based approach for fast page segmentation. After the analysis of three different methods individually, a hierarchical approach for document content classification is proposed, which classifies a sub-image into one of two categories: text and halftone. Our approach combines a neural network model, cross-correlation metric, and Kolmogorov complexity measure in a hierarchical structure. Considering the necessity of a recognition system, we also propose using a three-layer feedforward neural network to classify text regions into Chinese and English scripts. The classification accuracy on a number of document images reaches 100% and 97.1% for halftone region and text region, respectively. Meanwhile, the system can achieve a correct rate of 92.3% and 95.0% for Chinese and alphabetic script determination, respectively.
URI: http://hdl.handle.net/10397/13900
ISSN: 0031-3203
EISSN: 1873-5142
DOI: 10.1016/S0031-3203(03)00128-6
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

7
Last Week
0
Last month
0
Citations as of Aug 15, 2017

WEB OF SCIENCETM
Citations

7
Last Week
0
Last month
Citations as of Aug 20, 2017

Page view(s)

29
Last Week
1
Last month
Checked on Aug 20, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.