PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
We present a methodology for modeling the statistics of image features and associated text in large datasets. The models used also serve to cluster the images, as images are modeled as being produced by sampling from a limited number of combinations of mixing components. Furthermore, because our approach models the joint occurrence image features and associated text, it can be used to predict the occurrence of either, based on observations or queries. This supports an attractive approach to image search as well as novel applications such a suggesting illustrations for blocks of text (auto-illustrate) and generating words for images outside the training set (auto-annotate). In this paper we illustrate the approach on 10,000 images of work from the Fine Arts Museum of San Francisco. The images include line drawings, paintings, and pictures of sculpture and ceramics. Many of the images have associated free text whose nature varies greatly, from physical description to interpretation and mood. We incorporate statistical natural language processing in order to deal with free text. We use WordNet to provide semantic grouping information and to help disambiguate word senses, as well as emphasize the hierarchical nature of semantic relationships.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text detection in video images is a challenging research problem because of the poor spatial resolution and the complex backgrounds, which may contain a variety of colors. This paper presents a multistage predictive coding scheme, referred to as Multistage Pulse Code Modulation (MPCM), which can be used to effectively detect text in color video frames. It converts a video image to a coded image with each pixel encoded by a priority code ranging from 7 down to 0. A priority code 7 retains the most significant information while a priority code 0 represents the least significant information which can be dropped without loss of much information. Using the global mean of the coded image as a threshold value, a set of potential text regions can be detected from each video frame. A series of spatial filters is then implemented in order to eliminate regions that are unlikely to contain text. As a final step, we eliminate those potential text regions where Optical character Recognition (OCR) produces no results. An extensive set of experiments demonstrates that our proposed MPCM-based text detection technique is effective in detecting text in a wide variety of video images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many document analysis and OCR systems depend on precise identification of page rotation, as well as the reliable identification of text lines. This paper presents a new algorithm to address both problems. It uses a branch-and-bound approach to globally optimal line finding and simultaneously models the baseline and the descender line under a Gaussian error/robust least square model. Results of applying the algorithm to documents in the University of Washington Database 2 are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is an important component of any document image analysis system. While many segmentation algorithms exist in the literature, very few i) allow users to specify the physical style, and ii) incorporate user-specified style information into the algorithm's objective function that is to be minimized. We describe a segmentation algorithm that models a document's physical structure as a hierarchical structure where each node describes a region of the document using a stochastic regular grammar. The exact form of the hierarchy and the stochastic language is specified by the user, while the probabilities associated with the transitions are estimated from groundtruth data. We demonstrate the segmentation algorithm on images of bilingual dictionaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation and cross-language information retrieval. In this work we describe a system for extracting term lexicons from printed copies of bilingual dictionaries. We describe our approach to page and definition segmentation and entry parsing. We have used the approach to parse a number of dictionaries and demonstrate the results for retrieval using a French-English Dictionary to generate a translation lexicon and a corpus of English queries applied to French documents to evaluation cross-language IR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Different character recognition problems have their own specific characteristics. The state-of-art OCR technologies take different recognition approaches, which are most effective, to recognize different types of characters. How to identify character type automatically, then use specific recognition engines, has not brought enough attention among researchers. Most of the limited researches are based on the whole document image, a block of text or a text line. This paper addresses the problem of character type identification independent of its content, including handwritten/printed Chinese character identification, and printed Chinese/English character identification, based on only one character. Exploiting some effective features, such as run-lengths histogram features and stroke density histogram features, we have got very promising result. The identification correct rate is higher than 98% in our experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a method that allows Japanese document images to be retrieved more accurately by using OCR character candidate information and a conventional plain text search engine. In this method, the document image is first recognized by normal OCR to produce text. Keyword areas are then estimated from the normal OCR produced text through morphological analysis. A lattice of candidate- character codes is extracted from these areas, and then character strings are extracted from the lattice using a word-matching method in noun areas and a K-th DP-matching method in undefined word areas. Finally, these extracted character strings are added to the normal OCR produced text to improve document retrieval accuracy when u sing a conventional plain text search engine. Experimental results from searches of 49 OHP sheet images revealed that our method has a high recall rate of 98.2%, compared to 90.3% with a conventional method using only normal OCR produced text, while requiring about the same processing time as normal OCR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper a new algorithm for recognizing handwritten Hindi digits is proposed. The proposed algorithm is based on using the topological characteristics combined with statistical properties of the given digits in order to extract a set of features that can be used in the process of digit classification. 10,000 handwritten digits are used in the experimental results. 1100 digits are used for training and another 5500 unseen digits are used for testing. The recognition rate has reached 97.56%, a substitution rate of 1.822%, and a rejection rate of 0.618%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The difficulties of handwritten numeral recognition mainly result from the intrinsic deformations of the handwritten numeral samples. One way to tackle these difficulties is to describe the variations of the feature vectors belonging to one class. Subspace method is a well-known effective pattern recognition method to fulfill this idea. In fact, the subspace method can be embedded into a multivariate linear regression model which response variables are the feature vector and the predictor variables are the principal components (PCs) of the feature vector. When the feature vector is not exactly a Gaussian distribution, it is possible to describe the feature vector more accurately in the sense of least mean squares (LMS) by some nonlinear functions parameterized by the same PCs. This method may result in a higher recognition performance. In this paper we propose an algorithm based on multivariate polynomial regression to fulfill this nonlinear extension. We use the projection pursuit regression (PPR) to determine the multivariate polynomials, in which the polynomial degrees are selected by the structural risk minimization (SRM) method. Experimental results show that our approach is an effective pattern recognition method for the problem of handwritten numeral recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We achieved content-based image retrieval by using the shape information contained in a image. A kind of high-order image features which emulate the feature detection process of human eyes were used to represent both input image and the template images stored in a database. Template matching was then applied between the input image and each template image to obtain the retrieval result. The matching process performs a kind of pseudo elastic matching between the feature sets of the input image and each template image. Such elastic matching process, together with the high-order features, provides an excellent approach to measure the dissimilarity, namely the spatial topology distance, between images. The method had been tested on the Columbia Object Image Library database. Preliminary experiments suggested promising result by our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Even seemingly simple drawings, diagrams, and sketches are hard for computer programs to interpret, because these inputs can be highly variable in several respects. This variability corrupts the expected mapping between a prior model of a configuration and an instance of it in the scene. We propose a scheme for representing ambiguity explicitly, within a subgraph matching framework, that limits its impact on the computational and program complexity of matching. First, ambiguous alternative structures in the input are explicitly represented by coupled subgraphs of the data graph, using a class of segmentation post-processing operations termed graph elaboration. Second, the matching process enforces mutual exclusion constraints among these coupled alternatives, and preferences or rankings associated with them enable better matches to be found early on by a constrained optimization process. We describe several elaboration processes, and extend a straightforward constraint-based subgraph matching scheme to elaborated data graphs. The discussion focuses on the domain of human stick figures in diverse poses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reading of technical drawings is a complex task for automatic document processing. In this paper we present a system for reading textual descriptions from technical drawings, which provides capabilities for converting paper-based documents into an electronic archiving database system. The proposed system consists of four major processing elements: form learning, form localization, optical character recognition (OCR), and result verification. The algorithms of each element are dedicated to solve the practical problems in reading technical drawing documents. Among them, form localization and OCR are the key processes for automation. Experimental results have shown the feasibility and efficiency of our approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electronic documents have gained wide acceptance due to the ease of editing and sharing of information. However, paper documents are still widely used in many environments. Moving into a paperless and distributed office has become a major goal for document image research. A new approach for form document representation is presented. This approach allows for electronic document sharing over the World Wide Web (WWW) using Extensible Markup Language (XML) technologies. Each document is mapped into three different views, an XML view to represent the preprinted and filled-in data, an XSL (Extensible style Sheets) view to represent the structure of the document, and a DTD (Document Type Definition) view to represent the document grammar and field constraints. The XML and XSL views are generated from a document template, either automatically using image processing techniques, or semi-automatically with minimal user interaction. The DTD representation may be fixed for general documents or may be generated semi-automatically by mining a number of filled-in document examples. Document templates need to be entered once to create the proposed representation. Afterwards, documents may be displayed, updated, or shared over the web. The merits of this approach are demonstrated using a number of examples of widely used forms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new approach for form document representation using the maximal grid of its frameset is presented. Using image processing techniques, a scanned form is transformed into a frameset composed of a number of cells. The maximal grid is the grid that encompasses all the horizontal and vertical lines in the form and can be easily generated from the cell coordinates. The number of cells from the original frameset, included in each of the cells created by the maximal grid, is then calculated. Those numbers are added for each row and column generating an array representation for the frameset. A novel algorithm for similarity matching of document framesets based on their maximal grid representations is introduced. The algorithm is robust to image noise and to line breaks, which makes it applicable to poor quality scanned documents. The matching algorithm renders the similarity between two forms as a value between 0 and 1. Thus, it may be used to rank the forms in a database according to their similarity to a query form. Several experiments were performed in order to demonstrate the accuracy and the efficiency of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A study was undertaken to determine the power of handwriting to distinguish between individuals. Handwriting samples of one thousand five hundred individuals, representative of the US population with respect to gender, age, ethnic groups, etc., were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting features from scanned images of handwriting. Attributes characteristic of the handwriting were obtained, e.g., line separation, slant, character shapes, etc. These attributes, which are a subset of attributes used by expert document examiners, were used to quantitatively establish individuality by using machine learning approaches. Using global attributes of hadwriting and very few characters in the writing, the ability to determine the writer with a high degree of confidence was established. The work is a step towards providing scientific support for admitting handwriting evidence in court. The mathematical approach and the resulting software also have the promise of aiding the expert document examiner.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
After scanning, a document is typically blurred, and some noise is introduced. Therefore, the enhancement process of a scanned document requires a denoising and deblurring step. Typically, these steps are performed using techniques originated in the Fourier-domain. It has been shown in many image processing applications such as compression and denoising that wavelet-domain processing outperforms Fourier-domain processing. One main reason for the success of wavelets is that wavelets adapt automatically to smooth and non-smooth parts in an image due to the link between wavelets and sophisticated smoothness spaces, the Besov spaces. Recently smoothing and sharpening of an image - interpreted as an increasing and decreasing of smoothness of an image - has been derived using Besov space properties. The goal of this paper is to use wavelet-based denoising and sharpening in Besov spaces in combination with characterization of lines and halftone patterns in the wavelet domain to build a complete wavelet-based enhancement system. It is shown that characteristics of a scanned document and the enhancement steps necessary for a digital copier application are well-suited to be modeled in terms of wavelet bases and Besov spaces. The modeling results leads to a very simple algorithmic implementation of a technique that qualitatively outperforms traditional Fourier-based techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In our previous work of writer identification, a database of handwriting samples (written in English) of over one thousand individuals was created, and two types of computer-generated features of sample handwriting were extracted: macro and micro features. Using these features, writer identification experiments were performed: given that a document is written by one of n writers, the task is to determine the writer. With n = 2, we correctly determined the writer with a 99% accuracy using only 10-character micro features in the writing; with n = 1000, the accuracy is dropped to 80%. To obtain higher performance, we propose a combination of macro and micro level features. First, macro level features are used in a filtering model: the computer program is presented with multiple handwriting samples from a large number (1000) of writers, and the question posed is: Which of the samples are consistent with a test sample? As a result of using the filtering model, a reduced set of documents (100) is obtained and presented to the final identification model which uses the micro level features. We improved our writer identification system from 80% to 87.5% by the proposed filtering-combination technique when n = 1000.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The JPEG 2000 standard enables a variety of approaches suitable to document images. The JPM file format defined in Part 6 specifies a method for layered image compression of compound images and supports multiple compression types within a single page. This allows JBIG2, JPEG 2000 and other compressed data types to be applied to different regions of a single image. While the JPEG 2000 Part 1 codestream also offers a variety of options of possible interest with document images. Low bit per pixel representations of textual regions at moderate spatial resolutions can be appropriate for screen renditions of text regions. These may be coded as lossless, visually lossless or lossy data. Document images can also be palettized and compressed losslessly by the palettized image method of JPEG 2000. A bitonal representation of a document page may be compressed without any wavelet transform, using only the entropy coder portion of the JPEG 2000 specification, or it may be coded in a lossy manner. This study compares the visual quality and compressed sizes of some of these alternative approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two major degradations, edge displacement and corner erosion, change the appearance of bilevel images. The displacement of an edge determines stroke width, and the erosion of a corner affects crispness. These degradations are functions of the system parameters: the point spread function (PSF) width and functional form, and the binarization threshold. Changing each of these parameters will affect an image differently. A given amount of edge displacement or amount of erosion of black or white corners can be caused by several combinations of the PSF width and the binarization threshold. Any pair of these degradations are unique to a single PSF width and binarization threshold for a given PSF function. Knowledge of all three degradation amounts provides information that will enable us to determine the PSF functional form from the bilevel image. The effect of each degradation on characters will be shown. Also, the uniqueness of the degradation triple {dw, db, dc} and the effect of selecting an incorrect PSF functional form will be shown, first with relation to PSF width and binarization threshold estimate, then for how this is visible in sample characters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper explores the problem of incorporating linguistic constraints into document image decoding, a communication theory approach to document recognition. Probabilistic character n-grams (n=2--5) are used in a two-pass strategy where the decoder first uses a very weak language model to generate a lattice of candidate output strings. These are then re-scored in the second pass using the full language model. Experimental results based on both synthesized and scanned data show that this approach is capable of improving the error rate by a factor of two to ten depending on the quality of the data and the details of the language model used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the limits of the character recognition engines (commercial OCRs) and how to exceed these limits to achieve the industrial goals in terms of document capture and coding performances. The recent integration of these OCRs in several industrial capture chains leads to think that a solution is possible to reach electronically the same performances obtained by human typists. After a global description of the problems and the exposure of the OCR limits, the paper will focus on the methodology used and details the different steps proposed for the individual performance improvement. The first step consists in the individual evaluation of the OCRs. This is made by comparing the OCR result with a ground truth, which allows to highlight its defects and catalogue its main errors on the document processed. The second step allows to increase these individual performances by combination the OCR with some others. Our choice has been fixed on the combination of only two OCRs deemed very efficient and complementary on the same class of documents. The residual errors are treated in the last step which be able to propose a list of heuristics resolving punctually the OCR defects on the limit cases. In order to validate our approach, we present in the second part of the paper a practical case of experimentation to reach industrial performances. This approach has been tested in the framework of an industrial application for automatic document capture, by attempting the lowest score, imposed on one specific document class, of 1 error for 10000 characters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a technique for modeling the character recognition accuracy of an OCR system -- treated as a black box -- on a particular page of printed text based on an examination only of the output top-choice character classifications and, for each, a confidence score such as is supplied by many commercial OCR systems. Latent conditional independence (LCI) models perform better on this task, in our experience, than naive uniform thresholding methods. Given a sufficiently large and representative dataset of OCR (errorful) output and manually proofed (correct) text, we can automatically infer LCI models that exhibit a useful degree of reliability. A collaboration between a PARC research group and a Xerox legacy conversion service bureau has demonstrated that such models can significantly improve the productivity of human proofing staff by triaging -- that is, selecting to bypass manual inspection -- pages whose estimated OCR accuracy exceeds a threshold chosen to ensure that a customer-specified per-page accuracy target will be met with sufficient confidence. We report experimental results on over 1400 pages. Our triage software tools are running in production and will be applied to more than 5 million pages of multi-lingual text.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the system's aspects of OCR solutions in the context of digital content re-mastering. It analyzes the unique requirements and challenges to implement a reliable OCR system in a high-volume and unattended environment. A new reliability metric is proposed and a practical solution based on the combination of multiple commercial OCR engines is introduced. Experimental results show that the combination system is both much more accurate and more reliable when compared with individual engines, thus it can fully satisfy the need of digital content re-mastering applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text quality can significantly affect the results of text detection and recognition in digital video. In this paper we address the problem of estimating text quality. The quality of text that appears in video is often much lower than that in document images, and can be degraded by factors such as low resolution, background variation, uneven lighting, motion of the text and camera, and in the case of scene text, projection from 3D. Features based on text resolution, background noise, contrast, illumination and texture are selected to describe the text quality, normalized and fed into a trained RBF network to estimate the text quality. The performance using different training schemes are compared.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There has been increasing interest in document capture with digital cameras, since they are often more convenient to use than conventional devices such as flatbed scanners. Unlike flatbed scanners, cameras can acquire document images with arbitrary perspectives. Without correction, perspective distortions are unappealing to human readers. They also make subsequent image analysis slower, more complicated and less reliable. The novel contribution of this paper is to view perspective estimation as a generalization of the well-studied skew estimation problem. Rather than estimating one angle of rotation we must determine four angles describing the perspective. In our method, separate estimates are made for angles describing lines that are parallel and perpendicular to text lines. Each of these estimates is based on a twice-iterated projection profile computation. We give a probabilistic argument for the method and describe an efficient implementation. Our results illustrate its primary benefits: it is robust and accurate. The method is efficient compared with the time required to warp the image to correct for perspective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper discusses the performance of a system for extracting bibliographic fields from scanned pages in biomedical journals to populate MEDLINE, the flagship database of the national Library of Medicine (NLM), and heavily used worldwide. This system consists of automated processes to extract the article title, author names, affiliations and abstract, and manual workstations for the entry of other required fields such as pagination, grant support information, databank accession numbers and others needed for a completed bibliographic record in MEDLINE. Labor and time data are given for (1) a wholly manual keyboarding process to create the records, (2) an OCR-based system that requires all fields except the abstract to be manually input, and (3) a more automated system that relies on document image analysis and understanding techniques for the extraction of several fields. It is shown that this last, most automated, approach requires less than 25% of the labor effort in the first, manual, process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.