The paper presents an algorithm for document image recognition robust to projective distortions. This algorithm is based on a similarity metric, which is learned using Siamese architecture. The idea of training Siamese networks is to build a function of converting the image into a space where a distance function corresponding to a pre-defined metric approximates the similarity between objects of initial space. During learning the loss function tries to minimize the distance between pairs of object from the same class and maximize it between the ones from different classes. A convolutional network is used for mapping initial space to the target one. This network lets to construct a feature vector in target space for each class. Classification of objects is performed using the mapping function and finding the nearest feature vector. The proposed algorithm achieved recognition quality comparable to classifying convolutional network on an open dataset of document images MIDV-500 [1]. Another important advantage of this method is the possibility of one-shot learning that is also shown in the paper.
KEYWORDS: Video, Optical character recognition, Reliability, Data modeling, Error analysis, Image segmentation, Digital video recorders, Image classification, Analytical research, Cameras
This paper describes the problem of combining classification results of multiple observations of one object. This task can be regarded as a particular case of a decision-making using a combination of experts votes with calculated weights. The accuracy of various methods of combining the classification results depending on different models of input data is investigated on the example of frame-by-frame character recognition in a video stream. Experimentally it is shown that the strategy of choosing a single most competent expert in case of input data without irrelevant observations has an advantage (in this case irrelevant means with character localization and segmentation errors). At the same time this work demonstrates the advantage of combining several most competent experts according to multiplication rule or voting if irrelevant samples are present in the input data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.