Machine perception and recognition of handwritten text in any language is a difficult problem. Even for Latin script most solutions are restricted to specific domains like bank checks courtesy amount recognition. Arabic script presents additional challenges for handwriting recognition systems due to its highly connected nature,
numerous forms of each letter, and other factors. In this paper we address the problem of offline Arabic handwriting
recognition of pre-segmented words. Rather than focusing on a single classification approach and trying to perfect it, we propose to combine heterogeneous classification methodologies. We evaluate our system on the IFN/ENIT corpus of Tunisian village and town names and demonstrate that the combined approach yields results that are better than those of the individual classifiers.
Although modern OCR technology is capable of handling a wide variety of document images, there is no single
OCR engine that performs equally well on all documents for a given single language script. Naturally, each OCR
engine has its strengths and weaknesses, and therefore different engines tend to differ in the accuracy on different
documents, and in the errors on the same document image. While the idea of using multiple OCR engines
to boost output accuracy is not new, most of the existing systems do not go beyond variations on majority
voting. While this approach may work well in many cases, it has limitations, especially when OCR technology
used to process a given script has not yet fully matured. Our goal is to develop a system called MEMOE (for
"Multi-Evidence Multi-OCR-Engine") that combines, in an optimal or near-optimal way, output streams of
one or more OCR engines together with various types of evidence extracted from these streams as well as from
original document images, to produce output of higher quality than that of the individual OCR engines, or of
majority voting applied to multiple OCR output streams. Furthermore, we aim to improve the accuracy of OCR
output on images that might otherwise have low accuracy that significantly impacts downstream processing.
The MEMOE system functions as an OCR engine taking document images and some configuration parameters
as input and producing a single output text stream. In this paper, we describe the design of the system, various
evidence types and how they are incorporated into MEMOE in the form of filters. Results of initial tests that
involve two corpora of Arabic documents show that, even in its initial configuration, the system is superior to a
voting algorithm and that even more improvement may be achieved by incorporating additional evidence types
into the system.
This paper describes new capabilities of ImageRefiner, an automatic image enhancement system based on machine learning (ML). ImageRefiner was initially designed as a pre-OCR cleanup filter for bitonal (black-and-white) document images. Using a single neural network, ImageRefiner learned which image enhancement transformations (filters) were best suited for a given document image and a given OCR engine, based on various image measurements (characteristics). The new release improves ImageRefiner in three major ways. First, to process grayscale document images, we have included three grayscale filters based on smart thresholding and noise filtering, as well as five image characteristics that are all byproducts of various thresholding techniques. Second, we have implemented additional ML algorithms, including a neural network ensemble and several "all-pairs" classifiers. Third, we have introduced a measure that evaluates overall performance of the system in terms of cumulative improvement of OCR accuracy. Our experiments indicate that OCR accuracy on enhanced grayscale images is higher than that of both the original grayscale images and the corresponding bitonal images obtained by scanning the same documents. We have noticed that the system's performance may suffer when document characteristics are correlated.
Accurate geometric registration is an important step that precedes various tasks of processing of remotely sensed imagery. Assuming that, after radiometric and systematic correction, images are registered to within a few pixels, our goal is to develop fast and reliable automatic registration methods for multi-sensor data that would yield sub-pixel accuracy. This paper compares two gradient-based algorithms for sub-pixel image registration developed by Thevenaz et al. One of them optimizes intensity difference while the other maximizes mutual information between two images. The algorithms were combined with three invariant wavelet pyramids, a centered cubic spline pyramid as well as both low-pass and band-pass Simoncelli Steerable pyramids. This paper compared the different variations of the two algorithms on both synthetic and real satellite imagery. We found that for single-sensor data, the intensity-based algorithm combined with a band-pass wavelet pyramid produces the best results, while for multi-sensor images, the best choice is the mutual-information-based method combined with a steerable low-pass pyramid.
We describe two approaches of systematic performance assessment of
a specific image registration algorithm. One approach involves generating radiometrically different synthetic images by convolving one of them with a point-spread function, while the other consists of
registration of three or more images to obtain multiple estimates of registration parameters. We present experimental results that indicate that different-radiometry synthetic data is more difficult to register and so it provides better testing than same-radiometry data used in our previous work. They also show that the multiple-estimate methodology, that we call triangulation, may be used not only to measure self-consistency of a given registration algorithm, but also to obtain estimates of ground truth information for images for which, if available at all, the ground truth is known only approximately.
This paper compares different filters used to register images taken from different satellites to subpixel precision. The registration algorithm is one proposed by Thevenaz et al which uses a modified Levenberg-Marquardt process to find the rigid transform that best maps one image into another. Our findings are that
while applied to single-sensor synthetic data, centered spline filters and the low pass band of Simoncelli steerable pyramid are equally sensitive to initial guess while the bandpass sub-band of the Simoncelli filters exhibits larger senstivity. For multisensor and noisy data, however, the bandpass filters produce the most consistent results.
Assuming that approximate registration is given within a few pixels by a systematic correction system, we develop automatic image registration methods for multi-sensor data with the goal of achieving sub-pixel accuracy. Automatic image registration is usually defined by three steps; feature extraction, feature matching, and data resampling or fusion. Our previous work focused on image correlation methods based on the use of different features. In this paper, we study different feature matching techniques and present five algorithms where the features are either original gray levels or wavelet-like features, and the feature matching is based on gradient descent optimization, statistical robust matching, and mutual information. These algorithms are tested and compared on several multi-sensor datasets covering one of the EOS Core Sites, the Konza Prairie in Kansas, from four different sensors: IKONOS (4m), Landsat-7/ETM+ (30 m), MODIS (500 m), and SeaWIFS (1000m).
Feature-based matching is essential for attaining sub-pixel registration of remotely sensed imagery. In this work, we focus on two different similarity metrics which are used to match extracted features, correlation and mutual information. Although mutual information has been successfully applied to medical image registration, these metrics have not been systematically studied for remote sensing applications. This paper presents some first results in the comparison of correlation and mutual information, relative to their respective accuracy and response to noise. The study is performed using Landsat-TM data.
Wavelet-based image registration has previously been proposed by the authors. In previous work, maxima obtained from orthogonal Daubechies filters as well as from Simoncelli steerable filters were utilized and compared to register images with a multi-resolution correlation technique. Previous comparative studies between both types of filters have shown that the accuracy obtained with orthogonal filters seemed to degrade very quickly for large rotations and large amounts of noise, while results obtained with steerable filters appeared much more stable under these conditions. In other studies based on the use of mutual information for image registration, several authors have shown that maximizing mutual information enables one to reach sub-pixel registration accuracy. In this work, we are utilizing Simoncelli steerable filters to provide the basic data from which mutual information is maximized and we are applying this method to remotely sensed imagery.
Wavelet-based image registration has previously been proposed by the authors. In previous work, maxima obtained from orthogonal Daybooks filters as well as from Simoncelli steerable filters were utilized and compared to register images in a multi-resolution fashion. The first comparative results between both types of filters showed that despite the lack of translation-invariance of the orthogonal filters, both types of filters gave very encouraging results for non-noisy data and small transformations. But the accuracy obtained with orthogonal filters seemed to degrade very quickly for large rotations and large amounts of noise, while results obtained with steerable filters appeared much more stable under these conditions. In this work, we are performing a systematic study of the robustness of such methods as a function of translation, rotation and noise parameters, for both types of filters and using data form the Landsat/Thematic Mapper.
A wavelet-based image registration approach has previously been proposed by the authors. In this work, wavelet coefficient maxima obtained from an orthogonal wavelet decomposition using Daubechies filters were utilized to register images in a multi-resolution fashion. Tested on several remote sensing datasets, this method gave very encouraging results. Despite the lack of translation- invariance of these filters, we showed that when using cross-correlation as a feature matching technique, features of size larger than twice the size of the filters are correctly registered by using the low-frequency subbands of the Daubechies wavelet decomposition. Nevertheless, high- frequency subbands are still sensitive to translation effects. In this work, we are considering a rotation- and translation-invariant representation developed by E. Simoncelli and integrate it in our image registration scheme. The two types of filters, Daubechies and Simoncelli filters, are then being compared from a registration point of view, utilizing synthetic data as well as data from the Landsat/Thematic Mapper and from the NOAA Advanced Very High Resolution Radiometer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.