Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.
KEYWORDS: Video, Video surveillance, Facial recognition systems, Video compression, Image resolution, Detection and tracking algorithms, Light sources and illumination, Digital cameras, Associative arrays, Intelligence systems
Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.
KEYWORDS: Cell phones, Detection and tracking algorithms, Scalable video coding, Information science, Information technology, Databases, Algorithm development, Neodymium, Boron, Lithium
As smartphones and touch screens are more and more popular, on-line signature verification technology can be used as
one of personal identification means for mobile computing. In this paper, a novel Laplacian Spectral Analysis (LSA)
based on-line signature verification method is presented and an integration framework of LSA and Dynamic Time
Warping (DTW) based methods for practical application is proposed. In LSA based method, a Laplacian matrix is
constructed by regarding the on-line signature as a graph. The signature’s writing speed information is utilized in the
Laplacian matrix of the graph. The eigenvalue spectrum of the Laplacian matrix is analyzed and used for signature
verification. The framework to integrate LSA and DTW methods is further proposed. DTW is integrated at two stages.
First, it is used to provide stroke matching results for the LSA method to construct the corresponding graph better.
Second, the on-line signature verification results by DTW are fused with that of the LSA method. Experimental results
on public signature database and practical signature data on mobile phones proved the effectiveness of the proposed
method.
KEYWORDS: Facial recognition systems, Nose, 3D image processing, Databases, 3D modeling, Detection and tracking algorithms, Mouth, Principal component analysis, Eye, 3D acquisition
In this paper, we propose a 3D face recognition approach based on the conformal representation of facial surfaces. Firstly, facial surfaces are mapped onto the 2D unit disk by Riemann mapping. Their conformal representation (i.e. the pair of mean curvature (MC) and conformal factor (CF) ) are then computed and encoded to Mean Curvature Images (MCIs) and Conformal Factor Images (CFIs). Considering that different regions of face deform unequally due to expression variation, MCIs and CFIs are divided into five parts. LDA is applied to each part to obtain the feature vector. At last, five parts are fused on the distance level for recognition. Extensive experiments carried out on the BU-3DFE database demonstrate the effectiveness of the proposed approach.
Different face views project different face topology in 2D images. The unified processing of face images with less topology different related to smaller range of face view angles is more convenient, and vice versa. Thus many researches divided the entire face pattern space form multiview face images into many subspaces with small range of view angles. However, large number of subspaces is computationally demanding, and different face processing algorithms take different strategies to handle the view changing. Therefore, the research of proper division of face pattern space is needed to ensure good performance. Different from other researches, this paper proposes an optimal view angle range criterion of face pattern space division in theory by careful analysis on the structure differences of multiview faces and on the influence to face processing algorithms. Then a face pattern space division method is proposed. Finally, this paper uses the proposed criterion and method to divide the face pattern space for face detection and compares with other division results. The final results show the proposed criterion and method can satisfy the processing performance with minimum number of subspace. The study in this paper can also help other researches which need to divide pattern space of other objects based on their different views.
Nowadays, video has gradually become the mainstream of dissemination media for its rich information capacity and intelligibility, and texts in videos often carry significant semantic information, thus making great contribution to video content understanding and construction of content-based video retrieval system. Text-based video analyses usually consist of text detection, localization, tracking, segmentation and recognition. There has been a large amount of research done on video text detection and tracking, but most solutions focus on text content processing in static frames, few making full use of redundancy between video frames. In this paper, a unified framework for text detection, localization and tracking in video frames is proposed. We select edge and corner distribution of text blocks as text features, localizing and tracking are performed. By making good use of redundancy between frames, location relations and motion characteristics are determined, thus effectively reduce false-alarm and raise correct rate in localizing. Tracking schemes are proposed for static and rolling texts respectively. Through multi-frame integration, text quality is promoted, so is correct rate of OCR. Experiments demonstrate the reduction of false-alarm and the increase of correct rate of localization and recognition.
People are often the most important subjects in videos. It is highly desired to automatically summarize the occurrences of different people in a large collection of video and quickly find the video clips containing a particular person among them. In this paper, we present a person-based video summarization and retrieval system named VideoWho which extracts temporal face sequences in videos and groups them into clusters, with each cluster containing video clips of the same person. This is accomplished based on advanced face detection and tracking algorithms, together with a semisupervised face clustering approach. The system achieved good clustering accuracy when tested on a hybrid video set including home video, TV plays and movies. On top of this technology, a number of applications can be built, such as automatic summarization of major characters in videos, person-related video search on the Internet and personalized UI systems etc.
This paper investigated the problem of orientation detection for document images with Chinese characters. These images
may be in four orientations: right side up, up-side down, 90° and 270° rotated counterclockwise. First, we presented the
structure of text-recognition-based orientation detection algorithm. Text line verification and orientation judgment
methods were mainly discussed, afterwards multiple experiments were carried. Distance-difference based text line
verification and confidence based text line verification were proposed and compared with methods without text line
verification. Then, a picture-based orientation detection framework was adopted for the situation where no text line was
detected. This high-level classification problem was solved by relatively low-level vision features including Color
Moments (CM) and Edge Direction Histogram (EDH), with distant-based classification scheme. Finally, confidencebased
classifier combination strategy was employed in order to make full use of the complementarity between different
features and classifiers. Experiments showed that both text line verification methods were able to improve the accuracy
of orientation detection, and picture-based orientation detection had a good performance for no-text image set.
Offline Chinese handwritten character string recognition is one of the most important research fields in pattern
recognition. Due to the free writing style, large variability in character shapes and different geometric characteristics,
Chinese handwritten character string recognition is a challenging problem to deal with. However, among the current
methods over-segmentation and merging method which integrates geometric information, character recognition
information and contextual information, shows a promising result. It is found experimentally that a large part of errors
are segmentation error and mainly occur around non-Chinese characters. In a Chinese character string, there are not only
wide characters namely Chinese characters, but also narrow characters like digits and letters of the alphabet. The
segmentation error is mainly caused by uniform geometric model imposed on all segmented candidate characters. To
solve this problem, post processing is employed to improve recognition accuracy of narrow characters. On one hand,
multi-geometric models are established for wide characters and narrow characters respectively. Under multi-geometric
models narrow characters are not prone to be merged. On the other hand, top rank recognition results of candidate paths
are integrated to boost final recognition of narrow characters. The post processing method is investigated on two
datasets, in total 1405 handwritten address strings. The wide character recognition accuracy has been improved lightly
and narrow character recognition accuracy has been increased up by 10.41% and 10.03% respectively. It indicates that
the post processing method is effective to improve recognition accuracy of narrow characters.
KEYWORDS: Curium, Feature extraction, Distance measurement, Classification systems, Printing, Simulation of CCA and DLA aggregates, Statistical analysis, Intelligence systems, Digital photography, Scanners
Automatic picture orientation recognition is of great significance in many applications such as consumer gallery
management, webpage browsing, content-based searching or web printing. We try to solve this high-level classification
problem by relatively low-level features including Spacial Color Moment (CM) and Edge Direction Histogram (EDH).
An improved distance-based classification scheme is adopted as our classifier. We propose an input-vector-rotating
strategy, which is computationally more efficient than several conventional schemes, instead of collecting and training
samples for all four classes. Then we research on the classifier combination algorithm to make full use of the
complementarity between different features and classifiers. Our classifier combination methods include two levels:
feature-level and measurement-level. And we present two classifier combination structures (parallel and cascaded) at
measurement-level with a rejection option. As the precondition of measurement-level methods, the theory of Classifier's
Confidence Analysis (CCA) is introduced with the definition of concepts such as classifier's confidence and generalized
confidence. The classification system finally approached 90% recognition accuracy on a wide unconstrained consumer
picture set.
KEYWORDS: Image segmentation, Optical character recognition, Image resolution, Detection and tracking algorithms, Image processing algorithms and systems, Video, Binary data, Simulation of CCA and DLA aggregates, Feature extraction, Machine learning
Web images constitute an important part of web document and become a powerful medium of expression, especially for
the images containing text. The text embedded in web images often carry semantic information related to layout and
content of the pages. Statistics show that there is a significant need to detect and recognize text from web images. In this
paper, we first give a short review of these methods proposed for text detection and recognition in web images; then a
framework to extract from web images is presented, including stages of text localization and recognition. In text
localization stage, localization method is applied to generate text candidates and a two-stage strategy is utilized to select
text candidates, then text regions are localized using a coarse-to-fine text lines extraction algorithm. For text recognition,
two text region binarization methods have been proposed to improve the performance of text recognition in web images.
Experimental results for text localization and recognition prove the effectiveness of these methods. Additionally, a
recognition evaluation for text regions in web images has been conducted for benchmark.
KEYWORDS: Optical character recognition, Image segmentation, Feature extraction, Image processing, Detection and tracking algorithms, FDA class I medical device development, Digital imaging, Intelligence systems, Machine learning, Selenium
A SemiBoost-based character recognition method is introduced in order to incorporate the information of unlabeled
practical samples in training stage. One of the key problems in semi-supervised learning is the criteria of unlabeled
sample selection. In this paper, a criteria based on pair-wise sample similarity is adopted to guide the SemiBoost learning
process. At each time of iteration, unlabeled examples are selected and assigned labels. The selected samples are used
along with the original labeled samples to train a new classifier. The trained classifiers are integrated to make the final
classfier. An empirical study on several Arabic similar character pairs with different similarities shows that the proposed
method improves the performance as unlabeled samples reveal the distribution of practical samples.
Target detection in multimodal (multisensor) images is a difficult problem especially with the impact of different views
and the complex backgrounds. In this paper, we propose a target detection method based on ground region matching and
spatial constraints to solve it. First, the extrinsic parameters of camera are used to transform the images to reduce the
impact of viewpoints differences. Then the stable ground object regions are extracted by MSER. Those regions are used
to build a graph model to describe the reference image with spatial constraints to reduce the impact of multimodal and
complex backgrounds. At last, the ground region matching and the model registration with sensed images are used to
find the target. Using this method, we overcome those difficulties and obtain a satisfied experiment result; the final
detection rate is 94.34% in our data set of visible reference images in top views and infrared sensed images in side views
KEYWORDS: Detection and tracking algorithms, Performance modeling, Image segmentation, Visual process modeling, Optical character recognition, Data processing, Inspection, Gaussian filters, Sensors, Video
Detection of characters regions is a meaningful research work for both highlighting region of interest and recognition for
further information processing. A lot of researches have been performed on character localization and extraction and this
leads to the great needs of performance evaluation scheme to inspect detection algorithms. In this paper, two probability
models are established to accomplish evaluation tasks for different applications respectively. For highlighting region of
interest, a Gaussian probability model, which simulates the property of a low-pass Gaussian filter of human vision
system (HVS), was constructed to allocate different weights to different character parts. It reveals the greatest potential
to describe the performance of detectors, especially, when the result detected is an incomplete character, where other
methods cannot effectively work. For the recognition destination, we also introduced a weighted probability model to
give an appropriate description for the contribution of detection results to final recognition results. The validity of
performance evaluation models proposed in this paper are proved by experiments on web images and natural scene
images. These models proposed in this paper may also be able to be applied in evaluating algorithms of locating other
objects, like face detection and more wide experiments need to be done to examine the assumption.
In this paper, we introduce a novel Gabor based Spacial Domain Class-Dependence Feature Analysis(GSD-CFA)
method that increases the Face Recognition Grand Challenge (FRGC)2.0 performance. In short, we integrate
Gabor image representation and spacial domain Class-Dependence Feature Analysis(CFA) method to perform
fast and robust face recognition. In this paper, we mainly concentrate on the performances of subspace-based
methods using Gabor feature. As all the experiments in this study is based on large scale face recognition
problems, such as FRGC, we do not compare the algorithms addressing small sample number problem. We study
the generalization ability of GSD-CFA on THFaceID data set. As FRGC2.0 Experiment #4 is a benchmark test
for face recognition algorithms, we compare the performance of GSD-CFA with other famous subspace-based
algorithms in this test.
The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or
hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation
module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese
characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting
path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric
information of the white and black connected components. The segmentation cost of the touching strokes is estimated
with connected components' shape and location, as well as the touching stroke width. The latter part uses local
optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of
segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical
Chinese documents show the proposed method is effective.
KEYWORDS: Performance modeling, Optical character recognition, Data modeling, Detection and tracking algorithms, Quantization, Curium, Systems modeling, Statistical modeling, Data storage, Optimization (mathematics)
The language model design and implementation issue is researched in this paper. Different from previous research,
we want to emphasize the importance of n-gram models based on words in the study of language model. We build up a
word based language model using the toolkit of SRILM and implement it for contextual language processing on Chinese
documents. A modified Absolute Discount smoothing algorithm is proposed to reduce the perplexity of the language
model. The word based language model improves the performance of post-processing of online handwritten character
recognition system compared with the character based language model, but it also increases computation and storage
cost greatly. Besides quantizing the model data non-uniformly, we design a new tree storage structure to compress the
model size, which leads to an increase in searching efficiency as well. We illustrate the set of approaches on a test corpus
of recognition results of online handwritten Chinese characters, and propose a modified confidence measure for
recognition candidate characters to get their accurate posterior probabilities while reducing the complexity. The weighted
combination of linguistic knowledge and candidate confidence information proves successful in this paper and can be
further developed to achieve improvements in recognition accuracy.
Eye blink detection is one of the important problems in computer vision. It has many applications such as face live
detection and driver fatigue analysis. The existing methods towards eye blink detection can be roughly divided into two
categories: contour template based and appearance based methods. The former one usually can extract eye contours
accurately. However, different templates should be involved for the closed and open eyes separately. These methods are
also sensitive to illumination changes. In the appearance based methods, image patches of open-eyes and closed-eyes are
collected as positive and negative samples to learn a classifier, but eye contours can not be accurately extracted. To
overcome drawbacks of the existing methods, this paper proposes an effective eye blink detection method based on an
improved eye contour extraction technique. In our method, eye contour model is represented by 16 landmarks therefore
it can describe both open and closed eyes. Each landmark is accurately recognized by fast classifier which is trained from
the appearance around this landmark. Experiments have been conducted on YALE and another large data set consisting
of frontal face images to extract the eye contour. The experimental results show that the proposed method is capable of
affording accurate eye location and robust in closed eye condition. It also performs well in the case of illumination
variants. The average time cost of our method is about 140ms on Pentium IV 2.8GHz PC 1G RAM, which satisfies the
real-time requirement for face video sequences. This method is also applied in a face live detection system and the
results are promising.
Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be
digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type
variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some
characteristics, for example, one character may be part of another character, we define the character set for recognition
according to the segmented components, and the components are combined into characters by rule-based post-processing
module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented.
For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and
connected components. As Mongolian has different font-types which are categorized into two major groups, the
parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is
introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant
character recognition kernels are integrated. Experiments show that the presented methods are effective. The text
recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free
style document, global property such as character size, line direction can hardly be concluded, which reveals a grave
limitation in traditional layout analysis.
'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity
function found on gradient information to locate text areas where gradient within a window have large magnitude and
various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via
statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with
cost function based on the probability model.
The semi-text-independent method of writer verification based on the linear framework is a method that can use
all characters of two handwritings to discriminate the writers in the condition of knowing the text contents. The
handwritings are allowed to just have small numbers of even totally different characters. This fills the vacancy
of the classical text-dependent methods and the text-independent methods of writer verification. Moreover, the
information, what every character is, is used for the semi-text-independent method in this paper. Two types
of standard templates, generated from many writer-unknown handwritten samples and printed samples of each
character, are introduced to represent the content information of each character. The difference vectors of the
character samples are gotten by subtracting the standard templates from the original feature vectors and used
to replace the original vectors in the process of writer verification. By removing a large amount of content
information and remaining the style information, the verification accuracy of the semi-text-independent method
is improved. On a handwriting database involving 30 writers, when the query handwriting and the reference
handwriting are composed of 30 distinct characters respectively, the average equal error rate (EER) of writer
verification reaches 9.96%. And when the handwritings contain 50 characters, the average EER falls to 6.34%,
which is 23.9% lower than the EER of not using the difference vectors.
KEYWORDS: Image segmentation, Optical character recognition, Image restoration, Cameras, 3D image restoration, 3D image processing, 3D modeling, Image enhancement, Detection and tracking algorithms, Fuzzy logic
In camera-based optical character recognition (OCR) applications, warping is a primary problem. Warped document
images should be restored before they are recognized by traditional OCR algorithm. This paper presents a novel
restoration approach, which first makes an estimation of baseline and vertical direction estimation based on rough line
and character segmentation, then selects several key points and determines their restoration mapping as a result of the
estimation step, at last performs Thin-Plate Splines (TPS) interpolation on full page image using these key points
mapping. The restored document image is expected to have straight baselines and erect character direction. This method
can restore arbitrary local warping as well as keep the restoration result natural and smooth, consequently improves the
performance of the OCR application. Experiments on several camera captured warped document images show
effectiveness of this approach.
Character Recognition and Document Retrieval still are very interesting research area although great
progress in performance has been made over the last decades. Advanced research topics in character recognition
and Document analysis are introduced in this paper, which include the further research in Tsinghai University on
handwritten Chinese character recognition, multilingual character recognition and writer identification. In
handwritten Chinese character recognition a special cascade MQDF classifier is discussed for unconstrained
cursive handwritten Chinese Character recognition and an optimum handwritten strip recognition algorithm is
introduced. In writer identification content dependent and content independent algorithms are discussed. In
multilingual character recognition a THOCR multilingual, including Japanese, Korean, Tibetan, Mongolian,
Uyghur, Arabic document recognition system is introduced in this paper.
For space remote sensors with high resolution, large caliber, and long focal length, in-orbit automatic focusing technique
is a significant application technology. Minimum image entropy (MIE) technology applied to real-time autofocusing
system of space optical remote sensor possesses creativity and engineering significance. MIE's theoretical analysis,
algorithm's computer simulation, and "Experimental Optical System" experiment have successfully validated MIE's
validity as the criterion for optical remote sensor's auto-focusing. Related data indicate that for a diffraction-limited
optical system with the f-ratio of F#=39, the detecting sensitivity for checking-focus can be better than 0.1 mm, by means
of MIE.
In this paper, we present an algorithm of estimating new-view vehicle speed. Different from far-view scenario, near-view
image provides more specific vehicle information such as body texture and vehicle identifier which makes it practical for
individual vehicle speed estimation. The algorithm adopts the idea of Vanishing Point to calibrate camera parameters and
Gaussian Mixture Model (GMM) to detect moving vehicles. After calibrating, it transforms image coordinates to the
real-world coordinates using a simple model - the Pinhole Model and calculates the vehicle speed in real-world
coordinates. Adopting the idea of Vanishing Point, this algorithm only needs two pre-measured parameters: camera
height and distance between camera and middle road line, other information such as camera orientation, focal length, and vehicle speed can be extracted from video data.
Object tracking is an essential problem in the field of video and image processing. Although tracking algorithms working
on gray video are convenient in actual applications, they are more difficult to be developed than those using color
features, since less information is taken into account. Few researches have been dedicated to tracking object using edge
information. In this paper, we proposed a novel video tracking algorithm based on edge information for gray videos. This
method adopts the combination of a condensation particle filter and an improved chamfer matching. The improved chamfer matching is rotation invariant and capable of estimating the shift between an observed image patch and a template by an orientation distance transform. A modified discriminative likelihood measurement method that focuses on the difference is adopted. These values are normalized and used as the weights of particles which predict and track the object. Experiment results show that our modifications to chamfer matching improve its performance in video tracking problem. And the algorithm is stable, robust, and can effectively handle rotation distortion. Further work can be done on updating the template to adapt to significant viewpoint and scale changes of the appearance of the object during the tracking process.
In this paper, upon the background of driving assistance on highway, we propose a real-time vehicle detection and tracking algorithm based on traffic scene analysis. We describe a general traffic scene analysis framework for vehicle detection and tracking based on roadside detection at first. On that basis, we present a new object detection algorithm via fusion of global classifier and part-based classifier and a vehicle detection algorithm integrating classifying confidence and local shadow. The local shadow is obtained by detecting the Maximally Stable Extremal Regions (MSER) using a multi-resolution strategy. Finally, we test our algorithm on several video sequence captured from highway and suburban roads. The test results show high efficiency and robustness when coping with environment transition, illumination variation and vehicle orientation change.
The application-relevant text data are very useful in various natural language applications. Using them can achieve
significantly better performance for vocabulary selection, language modeling, which are widely employed in automatic
speech recognition, intelligent input method etc. In some situations, however, the relevant data is hard to collect. Thus,
the scarcity of application-relevant training text brings difficulty upon these natural language processing. In this paper,
only using a small set of application specific text, by combining unsupervised text clustering and text retrieval
techniques, the proposed approach can find the relevant text from unorganized large scale corpus, thereby, adapt
training corpus towards the application area of interest. We use the performance of n-gram statistical language model,
which is trained from the text retrieved and test on the application-specific text, to evaluate the relevance of the text
acquired, accordingly, to validate the effectiveness of our corpus adaptation approach. The language models trained
from the ranked text bundles present well discriminated perplexities on the application-specific text. The preliminary
experiments on short message text and unorganized large corpus demonstrate the performance of the proposed methods.
Post-processing of OCR is a bottleneck of the document image processing system. Proof reading is necessary since the
current recognition rate is not enough for publishing. The OCR system provides every recognition result with a confident
or unconfident label. People only need to check unconfident characters while the error rate of confident characters is low
enough for publishing. However, the current algorithm marks too many unconfident characters, so optimization of OCR
results is required. In this paper we propose an algorithm based on pattern matching to decrease the number of
unconfident results. If an unconfident character matches a confident character well, its label could be changed into a
confident one. Pattern matching makes use of original character images, so it could reduce the problem caused by image
normalization and scanned noises. We introduce WXOR, WAN, and four-corner based pattern matching to improve the
effect of matching, and introduce confidence analysis to reduce the errors of similar characters. Experimental results
show that our algorithm achieves improvements of 54.18% in the first image set that contains 102,417 Chinese
characters, and 49.85% in the second image set that contains 53,778 Chinese characters.
In general, the point correspondence and automatic face structure extraction are challenging problems. This is due to the fact that automatic extraction and matching of a set of significant feature points on different image views on the face, which are needed to recover the individual's 3-D face modal, is a very hard machine task. In this paper, in order to bypass this problem, our method recovers both the pose and the 3-D face coordinates using shape matching morphing model and iterative minimization of a metric based on the structure matching. A radial basis function (RBF) in 3-D is used to morph a generic face into the specific face structure and shape context (SC) is used to descript point shape. Basing on RBF and SC, shape distance is used to measure the similarity of two shapes. Experiment results are shown for images of real faces and promising result are obtained.
The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM and S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM and S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM and S and thus preserves acceptable decoded image quality.
Document authentication decides whether a given document is from a specific individual or not. In this paper, we propose a new document authentication method in physical (after document printed out) domain by embedding deformation characters. When an author writers a document to a specific individual or organization, a unique error-correcting code which serves as his Personal Identification Number (PIN) is proposed and then some characters in the text line are deformed according to his PIN. By doing so, the writer's personal information is embedded in the document. When the document is received, it is first scanned and recognized by an OCR module, and then the deformed characters are detected to get the PIN, which can be used to decide the originality of the document. So the document authentication can be viewed as a kind of communication problems in which the identity of a document from a writer is being "transmitted" over a channel. The channel consists of the writer's PIN, the document, and the encoding rule. Experimental result on deformation character detection is very promising, and the availability and practicability of the proposed method is verified by a practical system.
A novel Markov random field (MRF) based framework is developed for the problem of 3D object recognition in multiple statuses. This approach utilizes densely sampled grids to represent the local information of the input images. Markov random field models are then created to model the geometric distribution of the object key points. Flexible matching, which seeks to find an accurate correspondence mapping between the key points of two images, is performed by combining the local similarities with the geometric relations using the highest confidence first (HCF) method. Afterwards, similarities between different images are calculated for object recognition. The algorithm is evaluated using the Coil-100 object database. The excellent recognition rates achieved in all the experiments indicate that our approach is well-suited for appearance-based 3-D object recognition. Comparisons with previous methods show that the proposed one is far more robust in the presence of object zooming, rotation, occlusion, noise, and viewpoint variations.
Obtaining valid iris texture is the precondition of iris based personal verification and identification. With a special camera, illumination environment and users' cooperation play an important role in capturing high quality iris image. However, the captured iris images are usually corrupted by partial occlusion (by eyelids and eyelashes) and white spots (from specular refection). An efficient scheme of obtaining valid iris texture is described in this paper. The described scheme includes two parts: a new algorithm for locating iris texture and a novel algorithm for segmenting corrupted iris texture. In order to overcome the drawbacks of traditional iris texture location methods, which are sensitive to white spots and occlusion, a new location algorithm is proposed. The proposed location algorithm is able to locate iris texture rapidly by avoiding the tremendous computation demands imposed by Canny detector, Hough transformation and integrodifferential operator. In addition, a novel segmentation algorithm is proposed in the latter part of this paper to exclude the occlusion regions from the corrupted annular iris texture. Experiments show that the proposed scheme is efficient and robust for obtaining valid iris texture, even if partial occlusion and white spots appear. Moreover, the average time cost of our scheme is about 150ms on Pentium IV 2.8GHz PC, which satisfies the real-time requirement.
A new sequence matching based feature extracting method is proposed in this paper, and the method is applied to on-line signature verification. The signature is first extracted as a point sequence in writing order. Then the sequence is matched with a model sequence that is extracted from the model signature, utilizing a modified DTW matching criterion. Based on the matching result, the sequence is divided into a fixed number of segments. Local shape features are extracted from each segment, making use of the direction and length information. Experiments show that this new feature extracting method is more discriminative than other commonly used feature extracting method. When applied to an on-line signature verification system, current feature extracting method shows benefit in verifying users with large variations in their genuine signatures.
As a cursive script, the characteristics of Arabic texts are different from Latin or Chinese greatly. For example, an Arabic character has up to four written forms and characters that can be joined are always joined on the baseline. Therefore, the methods used for Arabic document recognition are special, where character segmentation is the most critical problem. In this paper, a printed Arabic document recognition system is presented, which is composed of text line segmentation, word segmentation, character segmentation, character recognition and post-processing stages. In the beginning, a top-down and bottom-up hybrid method based on connected components classification is proposed to segment Arabic texts into lines and words. Subsequently, characters are segmented by analysis the word contour. At first the baseline position of a given word is estimated, and then a function denote the distance between contour and baseline is analyzed to find out all candidate segmentation points, at last structure rules are proposed to merge over-segmented characters. After character segmentation, both statistical features and structure features are used to do character recognition. Finally, lexicon is used to improve recognition results. Experiment shows that the recognition accuracy of the system has achieved 97.62%.
Although about 300 million people worldwide, in several different languages, take Arabic characters for writing, Arabic OCR has not been researched as thoroughly as other widely used characters (Latin or Chinese). In this paper, a new statistical method is developed to recognize machine-printed Arabic characters. Firstly, the entire Arabic character set is pre-classified into 32 sub-sets in terms of character forms (Isolated, Final, Initial, Medial), special zones (divided according to the headline and the baseline of a text line) that characters occupy and component information (with or without secondary parts, say, diacritical marks, movements, etc.). Then 12 types of directional features are extracted from character profiles. After dimension reduction by linear discriminant analysis (LDA), features are sent to modified quadratic discriminant function (MQDF), which is utilized as the final classifier. At last, similar characters are discriminated before outputting recognition results. Selecting involved parameters properly, encouraging experimental results on test sets demonstrate the validity of proposed approach.
Style is an important feature of printed or handwritten characters. But it is not studied thoroughly compared with character recognition. In this paper, we try to learn how many typical styles exist in a kind of real world form images. A hierarchical clustering method has been developed and tested. A cross recognition error rate constraint is proposed to reduce the false combinations in the hierarchical clustering process, and a cluster selecting method is used to delete redundant or unsuitable clusters. Only a similarity measure between any patterns is needed by the algorithm. It is tested on a template matching based similarity measure which can be extended to any other feature and distance measure easily. The detailed comparing on every step’s effects is shown in the paper. Total 16 kinds of typical styles are found out, and by giving each character in each style a prototype for recognition, a 0.78% error rate is achieved by recognizing the testing set.
The novel remote sensor of the Space Solar Telescope (SST) is scheduled for launch in 2008. It will be uniquely designed to be the world’s first facility capable of observing with γ = 0.1" spatial resolution in vector magnetograms in the photosphere and the chromosphere, and 0.5" in soft X-rays. The high spatial resolution makes the on-orbit automatic focus (AF) the key technique to catch images. The paper brings forward a new method of the minimum entropy (ME) criterion for the astro-observation. Further more, we have applied such technology to the on-orbit AF of SST. The emulational program calculated the image entropies of different off-focus states. Data indicate that the minimum image entropy is corresponding to the optimal image plane; the ME criterion is more suitable for the heavenly bodies of low contrast and the focusing precision is 0.01 mm (δ' = 0.01mm).
Text segmentation plays a crucial role in a text recognition system. A comprehensive method is proposed to solve Tibetan/English text segmentation. 2 algorithms based on Tibetan inter-syllabic tshegs and discirminant function, respectively, are presented to perform skew detection before text line separation. Then a dynamic recursive character segmentation algorithm integrating multi-level information is developed. The encouraging experimental results on a large-scale Tibetan/English mixed text set show the validity of proposed method.
Logical structure extraction of book documents is significant in electronic document database automatic construction. The tables of contents in a book play an important role in representing the overall logical structure and reference information of the book documents. In this paper, a new method is proposed to extract the hierarchical logical structure of book documents, in addition to the reference information, by combining spatial and semantic information of the tables of contents in a book. Experimental results obtained from testing on various book documents demonstrate the effectiveness and robustness of the proposed approach.
Tibetan optical character recognition (OCR) system plays a crucial role in the Chinese multi-language information processing system. This paper proposed a new statistical method to perform multi-font printed Tibetan/English character recognition. A robust Tibetan character recognition kernel is elaborately designed. Incorporating with previous English character recognition techniques, the recognition accuracy on a test set containing 206,100 multi-font printed characters reaches 99.67%, which shows the validity of the proposed method.
The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.
In this paper we propose a general framework for character segmentation in complex multilingual documents, which is an endeavor to combine the traditionally separated segmentation and recognition processes into a cooperative system. The framework contains three basic steps: Dissection, Local Optimization and Global Optimization, which are designed to fuse various properties of the segmentation hypotheses hierarchically into a composite evaluation to decide the final recognition results. Experimental results show that this framework is general enough to be applied in variety of documents.
A sample system based on this framework to recognize Chinese, Japanese and Korean documents and experimental performance is reported finally.
Printed character image contains not only the information of characters, but also the information of fonts. Font information is essential in layout analysis and reconstruction, and is helpful to improve the performance of character recognition system. An algorithm for font recognition of single Chinese character is proposed in this paper. The aim is to analyze a single Chinese character and to identify the font. No priori knowledge of characters is required for font recognition. The new algorithm can recognize the font of a single Chinese character while existing methods are all based on a block of text. Stroke property features and stroke distribution features are extracted from a single Chinese character and two classifiers are employed to classify different fonts. We combine these two classifiers by logistic regression method to get the final result. Experiment shows that our method can recognize the font of a single Chinese character effectively.
KEYWORDS: Optical character recognition, Image processing, Error analysis, Image segmentation, Image restoration, Intelligence systems, Image classification, Data analysis, Digital libraries, Process control
This paper introduces a newly designed general-purpose Chinese document data capture system - Tsinghua OCR Network Edition (TONE). The system aimed to cut down the high cost in the process of digitalizing mass Chinese paper documents. Our first step was to divide the whole data-entry process into a few single-purpose procedures. Then based on these procedures, a production-line-like system configuration was developed. By design, the management cost was reduced directly by substituting automated task scheduling for traditional manual assignment, and indirectly by adopting well-designed quality control mechanism. Classification distances, character image positions, and context grammars are synthesized to reject questionable characters. Experiments showed that when 19.91% of the characters are rejected, the residual error rate could be 0.0097% (below one per ten thousand characters). This finally improved the error-rejecting module to be applicable. According to the cost distribution (specially, the manual correction occupies 70% of total) in the data companies, the estimated total cost reduction could be over 50%.
Handwritten and machine-printed characters are recognized separately in most OCR systems due to their distinct difference. In applications where both kinds of characters are involved, it is necessary to judge a character’s handwritten/printed property before feeding it into the proper recognition engine. In this paper, a new method to discriminate between handwritten and machine-printed character is proposed. Unlike most previous works, the discrimination we carried out in this paper is totally based on single character. Five kinds of statistical features are extracted from character image, then feature selection and classification are implemented simultaneously by a learning algorithm based on AdaBoost. Experiments on large data sets have demonstrated the effectiveness of the method.
In this paper, a direction sequence string matching based on-line signature verification system is proposed. A signature is coded as a direction sequence string. The modified edit distance is used for string matching. The test signature is compared with 5 reference signatures and distance is given by averaging the 5 distances. A verification result is given by comparing the distance measure with a pre-calculated threshold. The experiment shows a result of 4.7% equal error rate (EER).
Different character recognition problems have their own specific characteristics. The state-of-art OCR technologies take different recognition approaches, which are most effective, to recognize different types of characters. How to identify character type automatically, then use specific recognition engines, has not brought enough attention among researchers. Most of the limited researches are based on the whole document image, a block of text or a text line. This paper addresses the problem of character type identification independent of its content, including handwritten/printed Chinese character identification, and printed Chinese/English character identification, based on only one character. Exploiting some effective features, such as run-lengths histogram features and stroke density histogram features, we have got very promising result. The identification correct rate is higher than 98% in our experiments.
The difficulties of handwritten numeral recognition mainly result from the intrinsic deformations of the handwritten numeral samples. One way to tackle these difficulties is to describe the variations of the feature vectors belonging to one class. Subspace method is a well-known effective pattern recognition method to fulfill this idea. In fact, the subspace method can be embedded into a multivariate linear regression model which response variables are the feature vector and the predictor variables are the principal components (PCs) of the feature vector. When the feature vector is not exactly a Gaussian distribution, it is possible to describe the feature vector more accurately in the sense of least mean squares (LMS) by some nonlinear functions parameterized by the same PCs. This method may result in a higher recognition performance. In this paper we propose an algorithm based on multivariate polynomial regression to fulfill this nonlinear extension. We use the projection pursuit regression (PPR) to determine the multivariate polynomials, in which the polynomial degrees are selected by the structural risk minimization (SRM) method. Experimental results show that our approach is an effective pattern recognition method for the problem of handwritten numeral recognition.
Character recognition in low quality and low-resolution images is still a challenging problem. In this paper a gray-scale image based character recognition algorithm is proposed, which is specially suit to gray scale images captured from real world or very low quality character recognition. In our research, we classify the deformations of the low quality and low-resolution character images into two categories: (1) High spatial frequency deformations derived from either the blur distortion by the point spread function (PSF) of scanners or cameras, random noises, or character deformations; (b) Low spatial frequency deformations mainly derived from the large- scale background variations. The traditional recognition methods based on binary images cannot give satisfactory results in these images because these deformations will result in great amount of strokes touch or stroke broken in the binarization process. In the proposed method, we directly extract transform features on the gray-scale character images, which will avoid the shortcomings produced by binarization process. Our method differs from the existing gray-scale methods in that it avoids the difficult and unstable step of finding character structures in the images. By applying adequate feature selection algorithms, such as linear discriminant analysis (LDA) or principal component analysis (PCA), we can select the low frequency components that preserve the fundamental shape of characters and discard the high frequency deformation components. We also develop a gray- level histogram based algorithm using native integral ratio (NIR) technique to find a threshold to remove the backgrounds of character images while maintaining the details of the character strokes as much as possible. Experiments have shown that this method is especially effective for recognition of images of low quality and low-resolution.
When syntactic pattern recognition approach is used to recognize handwritten Chinese characters it is inevitable that strokes from Chinese characters are extracted first. In 1987 Gu and Wang presented a stroke-extracting algorithm named straight line sequence approximation (SLSA) algorithm. SLSA does not need a thinning process, so it is very fast. However, in the crossing regions among strokes SLSA often extracts false strokes and causes stroke-splitting. The improved SLSA algorithm described in this paper aims at solving this problem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.