KEYWORDS: Video, Machine learning, Image segmentation, Detection and tracking algorithms, Skin, Multimedia, Optical character recognition, Cameras, Data modeling, Semantic video
Video segmentation and indexing are important steps in multi-media document understanding and information
retrieval. This paper presents a novel machine learning based approach for automatic structuring and indexing
of lecture videos. By indexing video content, we can support both topic indexing and semantic querying of
multimedia documents. In this paper, our proposed approach extracts features from video images and then uses
these features to construct a model to label video frames. Using this model, we are able to segment and indexing
videos with accuracy of 95% on our test collection.
Document layout analysis is of fundamental importance for document image understanding and information retrieval.
It requires the identification of blocks extracted from a document image via features extraction and block
classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine
printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications
of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting,
bagging, and combined model trees) in addition to other known learning algorithms. Experimental results
are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected
from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of
the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in
conjunction with the Ocropus feature set, we further improve the performance of the block classification system
to obtain a classification accuracy of 99.21%.
Layout analysis is a crucial process for document image understanding and information retrieval. Document
layout analysis depends on page segmentation and block classification. This paper describes an algorithm for
extracting blocks from document images and a boosting based method to classify those blocks as machine printed
text or not. The feature vector which is fed into the boosting classifier consists of a four direction run-length
histogram, and connected components features in both background and foreground. Using a combination of
features through a boosting classifier, we obtain an accuracy of 99.5% on our test collection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.