Paper
13 April 2018 Training a whole-book LSTM-based recognizer with an optimal training set
Mohammad Reza Soheili, Mohammad Reza Yousefi, Ehsanollah Kabir, Didier Stricker
Author Affiliations +
Proceedings Volume 10696, Tenth International Conference on Machine Vision (ICMV 2017); 1069610 (2018) https://doi.org/10.1117/12.2309615
Event: Tenth International Conference on Machine Vision, 2017, Vienna, Austria
Abstract
Despite the recent progress in OCR technologies, whole-book recognition, is still a challenging task, in particular in case of old and historical books, that the unknown font faces or low quality of paper and print contributes to the challenge. Therefore, pre-trained recognizers and generic methods do not usually perform up to required standards, and usually the performance degrades for larger scale recognition tasks, such as of a book. Such reportedly low error-rate methods turn out to require a great deal of manual correction. Generally, such methodologies do not make effective use of concepts such redundancy in whole-book recognition. In this work, we propose to train Long Short Term Memory (LSTM) networks on a minimal training set obtained from the book to be recognized. We show that clustering all the sub-words in the book, and using the sub-word cluster centers as the training set for the LSTM network, we can train models that outperform any identical network that is trained with randomly selected pages of the book. In our experiments, we also show that although the sub-word cluster centers are equivalent to about 8 pages of text for a 101- page book, a LSTM network trained on such a set performs competitively compared to an identical network that is trained on a set of 60 randomly selected pages of the book.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Mohammad Reza Soheili, Mohammad Reza Yousefi, Ehsanollah Kabir, and Didier Stricker "Training a whole-book LSTM-based recognizer with an optimal training set", Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 1069610 (13 April 2018); https://doi.org/10.1117/12.2309615
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Neural networks

Computer engineering

Artificial intelligence

Document image analysis

RELATED CONTENT


Back to Top