Paper
3 April 1997 OCR for World Wide Web images
Author Affiliations +
Proceedings Volume 3027, Document Recognition IV; (1997) https://doi.org/10.1117/12.270080
Event: Electronic Imaging '97, 1997, San Jose, CA, United States
Abstract
A significant amount of text now present in World Wide Web documents is embedded in image data, and a large portion of it does not appear elsewhere at all. To make this information available, we need to develop techniques for recovering textual information from in-line Web images. In this paper, we describe two methods for Web image OCR. Recognizing text extracted from in-line Web images is difficult because characters in these images are often rendered at a low spatial resolution. Such images are typically considered to be 'low quality' by traditional OCR technologies. Our proposed methods utilize the information contained in the color bits to compensate for the loss of information due to low sampling resolution. The first method uses a polynomial surface fitting technique for object recognition. The second method is based on the traditional n-tuple technique. We collected a small set of character samples from Web documents and tested the two algorithms. Preliminary experimental results show that our n-tuple method works quite well. However, the surface fitting method performs rather poorly due to the coarseness and small number of color shades used in the text.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiangying Zhou, Daniel P. Lopresti, and Zhibin Lei "OCR for World Wide Web images", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); https://doi.org/10.1117/12.270080
Lens.org Logo
CITATIONS
Cited by 23 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Internet

Spatial resolution

Binary data

Lithium

Image classification

Image processing

Back to Top