PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Reducing noise in image query processing is no doubt one of the key elements to achieve high retrieval effectiveness. However, existing techniques are not able to eliminate noise from similarity matching since they capture the features of the entire image are or pre-perceived objects at the database build time. In this paper we address this outstanding issue by proposing a similarity mode for noise- free queries. In our approach, users formulate their queries by specifying objects of interest, and image similarity is based only on these relevant objects. We discuss how our approach can handle translation and scaling matching as well as how space overhead can be minimized. Our experiments show that this approach, with 1/16 the storage overhead, outperforms techniques for rectangular queries and a related technique by a significant margin.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new similarity measure for color images, that is angular distance of cumulative histogram. The measure is compared to previous popular measure that is cumulative L1 distance measure on RGB and HSV color space. We show that our method produces a better result than cumulative L1 distance measure. Moreover, to increase the accuracy, we introduce the weighting method. Weights are applied to the RGB similarity, DR DG, DB, according to a Hue histogram of the query image. The weighting method increase accuracy and perceptually relevant result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present an image spot query technique as an alternative for content-based image retrieval based on similarity over feature vectors. Image spots are selective parts of a query image designated by users as highly relevant for the desired answer set. Compared to traditional approaches, our technique allows users to search image databases for local characteristics rather than global features. When a user query is presented to our search engine, the engine does not impose any policy of its own on the answer set; it performs an exact match based on the query terms against the database. Semantic higher concepts such as weighing the relevance of query terms, is left to the user as a task while refining their query to reach the desired answer set. Given the hundreds of feature terms involved in query spots, refinement algorithms are to be encapsulated in separate applications, which act as an intermediary between our search engine and the users.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an image retrieval method based on region shape similarity. In our approach, we first segment images into primitive regions and then combine some of the primitive regions to generate meaningful composite shapes, which are used as semantic units of the images during the similarity assessment process. We employ three global shape features and a set of normalized Fourier descriptors to characterize each meaningful shape. All these features are invariant under similar transformations. Finally, we measure the similarity between two image by finding the most similar pair of shapes in the two images. Our approach has demonstrated good performance in our retrieval experiments on clipart images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advance of multimedia technologies and the explosive expansion of the World Wide Web, the volume of image and video data increases rapidly. An efficient and effective multimedia data retrieval technique is needed. In this paper, we propose an approach based on feature points for the content-based image retrieval. The feature points extracted from the multiresolution representation of the query image and database image are first matched to determine the matching pairs. Then, the marching pairs are classified into groups. Finally, two similarity measurements based on different similarity requirements are proposed to compute the similarity degree. We perform a series of experiments to study the characteristics of this approach, and compare with the region-based approach on similar-shot sequence retrieval. The comparison shows the superiority of this approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a new method for color image indexing and content-based retrieval. An image is divided into small sub- images and the visual appearance of which is characterized by a colored pattern appearance model. The statistics of the local visual appearance of the image are then computed as measures of the global visual appearance of the image. The visual appearance of the small sub-images is modeled by their spatial pattern, color direction and local energy strength. To encode the local visual appearance, an approach based on vector quantization (VQ) is introduced. The distributions of the VQ code indices are then used to index/retrieve the images. The new method can not only be used to achieve effective image indexing and retrieval; it can also be used for image compression. Based on this method, indexing and retrieval can be easily and conveniently performed in the compressed domain without performing decoding operation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of content-based image searching has received considerable attention in the last few years. Thousands of images are now available on the Internet, and many important applications require searching of images in domains such as E-commerce, medical imaging, weather prediction, satellite imagery, and so on. Yet, content-based image querying is still largely unestablished as a mainstream field, nor is it widely used by search engines. We believe that two of the major hurdles for this poor acceptance are poor retrieval quality and usability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper investigates the problem of high-level querying of multimedia data by imposing arbitrary domain-specific constraints among multimedia objects. We argue that the current structured query mode, and the query-by-content model, are insufficient for many important applications, and we propose an alternative query framework that unifies and extends the previous two models. The proposed framework is based on the querying-by-concept paradigm, where the query is expressed simply in terms of concepts, regardless of the complexity of the underlying multimedia search engines. The query-by-concept paradigm was previously illustrated by the CAMEL system. The present paper builds upon and extends that work by adding arbitrary constraints and multiple levels of hierarchy in the concept representation model. We consider queries simply as descriptions of virtual data set, and that allows us to use the same unifying concept representation for query specification, as well as for data annotation purposes. We also identify some key issues and challenges presented by the new framework, and we outline possible approaches for overcoming them. In particular, we study the problems of concept representation, extraction, refinement, storage, and matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new contour based shape matching approach capable of recognizing partially occluded objects in images. There are tow major steps in the recognition process of partially occluded objects: 1) feature extraction, and 2) similarity matching. The features for a candidate image region are the contour for better accuracy in curvature estimation. To determine the similarity between a candidate region and an object template, we compute the Hausdorff distance from the contour of the candidate image region to that of the template, where the region contour has been mapped into the template domain. Once a match is established, the information is retained using a hierarchical content description scheme, enabling expedient object based image retrieval at a later time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As an effective solution of the content-based image retrieval problems, relevance feedback has been put on many efforts for the past few years. In this paper, we propose a new relevance feedback approach with progressive leaning capability. It is based on a Bayesian classifier and treats positive and negative feedback examples with different strategies. It can utilitize previous users' feedback information to help the current query. Experimental results show that our algorithm achieves high accuracy and effectiveness on real-world image collections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the issues involved in designing a classifier for multimedia indexing, a representative of domain of tasks involving high dimensionality of feature space and large dissimilarity between features in range and variation, and requiring a strong inference mechanisms. We consider decision trees, Bayesian network, neural network and support vector approaches. The Modified Bayesian Network (MBN), as designed by us offers significant advantages over other approaches. The application of Bayesian network has generally been restricted to domains having discrete variable values, or to the domain with continuos variable values which approximate to Gaussian distribution. However, MBN can form sound representation of non-Gaussian Multimodal continuous distribution, as is the case with feature space in multimedia indexing. This can be accomplished by intelligent partitioning and data clique association. The structure of MBN and its functionality on real video is also presented in the paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An original image retrieval framework is proposed and developed. Different from the popular systems of retrieval, we break features into feature elements - FEs which have meaningful visual sense instead of combining them to get semantic meanings. These feature elements are evaluated according to the subjective perception of human beings. As result, three classes of feature elements are obtained as important FEs, extend FEs and trivial FEs. Each class of feature elements is organized to form the FE Data Set. Then the retrieval process is turned into searching the feature elements in corresponding sets. Interactive function is also built in. With association feedback, the associated feature elements of both user interest and the given retrieval result are detected and analyzed. Thus, the system can grasp the user's target more accurately. Even if the user switches to other retrieval interest, the system can also trace it by the associated part of the feature elements. As the whole approach is based on the instinctive perception, fine effect is reached.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The worldwide research efforts in the are of image and video retrieval have concentrated so far on increasing the efficiency and reliability of extracting the elements of image and video semantics and so on improving the search and retrieval performance at the cognitive level of content abstraction. At this abstraction level, the user is searching for 'factual' or 'objective' content such as image showing a panorama of San Francisco, an outdoor or an indoor image, a broadcast news report on a defined topic, a movie dialog between the actors A and B or the parts of a basketball game showing fast breaks, steals and scores. These efforts, however, do not address the retrieval applications at the so-called affective level of content abstraction where the 'ground truth' is not strictly defined. Such applications are, for instance, those where subjectivity of the user plays the major role, e.g. the task of retrieving all images that the user 'likes most', and those that are based on 'recognizing emotions' in audiovisual data. Typical examples are searching for all images that 'radiate happiness', identifying all 'sad' movie fragments and looking for the 'romantic landscapes', 'sentimental' movie segments, 'movie highlights' or 'most exciting' moments of a sport event. This paper discusses the needs and possibilities for widening the current scope of research in the area of image and video search and retrieval in order to enable applications at the affective level of content abstraction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we examine emerging frontiers in the evolution of content-based retrieval systems that rely on an intelligent infrastructure. Here, we refer to intelligence as the capabilities of the systems to build and maintain situational or world models, utilize dynamic knowledge representation, exploit context, and leverage advanced reasoning and learning capabilities. We argue that these elements are essential to producing effective systems for retrieving audio-visual content at semantic levels matching those of human perception and cognition. In this paper, we review relevant research on the understanding of human intelligence and construction of intelligent system in the fields of cognitive psychology, artificial intelligence, semiotics, and computer vision. We also discus how some of the principal ideas form these fields lead to new opportunities and capabilities for content-based retrieval systems. Finally, we describe some of our efforts in these directions. In particular, we present MediaNet, a multimedia knowledge presentation framework, and some MPEG-7 description tools that facilitate and enable intelligent content-based retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Distributed resource discovery is an essential step for information retrieval and providing information services. This step is usually used for determining the location of an information/data repository that has relevant information/data. The most fundamental challenge is the potential lack of semantic interoperability among these repositories. In this paper, we proposed an algorithm to enable distributed resource discovery. In the proposed method, the distributed repositories achieve pair wise semantic interoperability through the exchange of both examples. For each repository, the local classifier is used to classify the examples sent by the remote repository, and the classifier from the remote repository is used to classify the examples from the local repository. The correspondence of the class labels from two repositories can then be established by examining the classification results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper studies the relation between images and text in image databases. An analysis of this relation results in the definition of three distinct query modalities: (1) linguistic scenario: images are part of a whole including a self-contained linguistic discourse, and their meaning derives form their interaction with the linguistic discourse. A typical case of this scenario is constituted by images on the World Wide Web; (2) closed world scenario: images are defined in a limited domain, and their meaning is anchored by conventions and norms in that domain; (3) user scenario: the linguistic discourse is provided by the user. This is the case of highly interactive systems with relevance feedback. This paper deals with image databases of the first type. It shows how the relation between images and text can be inferred, and exploited for search. The paper develops a similarity model in which the similarity between two images is given by both their visual similarity and the similarity of the attached words. Both the visual and textural similarity can be manipulated by the user through the two windows of the interface.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Humans tend to use high-level semantic concepts when querying and browsing multimedia databases; there is thus, a need for systems that extract these concepts and make available annotations for the multimedia data. The system presented in this paper satisfies this need by automatically generating semantic concepts for images form their low-level visual features. The proposed system is built in two stages. First, an adaptation of k-means clustering using a non- Euclidean similarity metric is applied to discover the natural patterns of the data in the low-level feature space; the cluster prototype is designed to summarize the cluster in a manner that is suited for quick human comprehension of its components. Second, statistics measuring the variation within each cluster are used to derive a set of mappings between the most significant low-level features and the most frequent keywords of the corresponding cluster. The set of the derived rules could be used further to capture the semantic content and index new untagged images added to the image database. The attachment of semantic concepts to images will also give the system the advantage of handling queries expressed in terms of keywords and thus, it reduces the semantic gap between the user's conceptualization of a query and the query that is actually specified to the system. While the suggested scheme works with any kind of low-level features, our implementation and description of the system is centered on the use of image color information. Experiments using a 21 00 image database are presented to show the efficacy of the proposed system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A necessary capability for content-based retrieval is to support the paradigm of query by example. In the past, there have been several attempts to use low-level features for video retrieval. None of the approaches however uses the multimedia information content of the video. We present an algorithm for matching multi modal patterns for the purpose of content-based video retrieval. The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. This is much more effective in constructing scenes from shots than using only visual content to do the same.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many different kinds of features have been used as the basis for shape retrieval from image databases. This paper investigates the relative effectiveness of several types of global shape feature, both singly and in combination. The features compared include well-established descriptors such as Fourier coefficients and moment invariants, as well as recently-proposed measures of triangularity and ellipticity. Experiments were conducted within the framework of the ARTISAN shape retrieval system, and retrieval effectiveness assessed on a database of over 10,000 images, using 24 queries and associated ground truth supplied by the UK Patent Office . Our experiments revealed only minor differences in retrieval effectiveness between different measures, suggesting that a wide variety of shape feature combinations can provide adequate discriminating power for effective shape retrieval in multi-component image collections such as trademark registries. Marked differences between measures were observed for some individual queries, suggesting that there could be considerable scope for improving retrieval effectiveness by providing users with an improved framework for searching multi-dimensional feature space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a video data model is proposed to represent the content of video data. In the proposed model, the trajectory and other properties of objects are recorded. From the trajectory, the motion event such as 'high speed' of an object and 'increasing distance' between objects can be automatically derived. A query language named V-SQL based on the video data model is also proposed for the users to describe the content of the desired video clips. A graphical user interface is implemented for an easier query specification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic shot boundary detection has been an active research area for nearly a decade and has led to high performance detection algorithms for hard cuts, fades and wipes. Reliable dissolve detection, however, is still an unsolved problem. In this paper, we present the first robust and reliable dissolve detection system. A detection rate of 69 percent was achieved while reducing the false alarm rate to an acceptable level of 68 percent on a test video set for which so far the best reported detection and false alarm rate had been 57 percent and 185 percent, respectively. In addition, the temporal extent of the dissolves are estimated by a multi-resolution detection approach. The three core ideas of our novel approach are firstly the creation of a dissolve synthesizer capable of creating in principle an infinite number of dissolve examples of any duration form a video database of raw video footage, secondly tow new features for capturing the characteristics of dissolves, and thirdly, the exploitation of machine learning ideas for reliable object detection such as the boostrap-method to improve the set of non-dissolve examples and the search at multiple resolutions as well as the usage of machine learning algorithms such as neural networks, support-vector machines and linear vector quantizer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a novel technique for detecting the presence of a wipe transition in video sequences and automatically identifying its type. Our scheme focuses on analyzing the characteristics of the underlying special edit effects and estimates actual transitions by polynomial data interpolation. In particular, a B-spline polynomial curve fitting technique is used to measure 'goodness' of fitting to determine the presence of gradual transitions. Our approach is able to recover the original transition behavior of an edit effect even if it is distorted by various post- processing stages. Our wipe transition detector has been tested on various real video sequences to evaluate the performance of the proposed algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a video summarization and semantics editing tool that is suited for content-based video indexing and retrieval with appropriate human operator assistance. The whole system has been designed with a clear focus on the extraction and exploitation of motion information inherent in the dynamic video scene. The dominant motion information has ben used explicitly for shot boundary detection, camera motion characterization, visual content variations description, and for key frame extraction. Various contributions have been made to ensure that the system works robustly with complex scenes and across different media types. A window-based graphical user interface has been designed to make the task very easy for interactive analysis and editing of semantic events and episode where appropriate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We study the problem of temporal partitioning a video sequence by blobs - the segments consisting of similar frames, and the choice of frames within the blobs well representing their content. In general, blobs give a finer subdivision of video then the shots. This problem is relevant for various applications, e.g., indexing and retrieval in video databases, or compression of video at very low bitrates. We present an efficient algorithm for the considered problem based on the use of a suitable frame distance measure reflecting similarity among the frames. Experimental results are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For more efficiently organizing, browsing, and retrieving digital video, it is important to extract video structure information at both scene and shot levels. This paper present an effective approach to video scene segmentation based on probabilistic model merging. In our proposed method, we regard the shots in video sequence as hidden state variable and use probabilistic clustering to get the best clustering performance. The experimental results show that our method produces reasonable clustering results based on the visual content. A project named HomeVideo is introduced to show the application of the proposed method for personal video materials management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic filtering of multimedia content is a challenging problem. The gap that exists between low-level media features and high-level semantics of multimedia is difficult to bridge. We propose a flexible probabilistic graphical framework to bridge this gap to some extent and perform automatic detection of semantic concepts. Using probabilistic multimedia objects and a network of such objects we support semantic filtering. Discovering the relationships that exist between semantic concepts, we show how the detection performance can be improved upon. We show that concepts which may not be directly observed in terms of media features, can be inferred based on their relation with those that are already detected. Heterogeneous features also can be fused in the multinet. We demonstrate this by inferring the concept outdoor based on the five detected multijects sky, snow, rocks, water and forestry and a frame- level global-features based outdoor detector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A hierarchical semantic tree for sport video analysis by incorporating mixed media cues form video, audio and caption texts is proposed in this research. It allows queries form users at different granularity of semantic meanings. A set of classification functions, which associate the low-level features with video high-level semantic meanings for various applications, are learned by supervised learning algorithms at each node. Experimental results show our proposed scheme and classification system is effective and promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a system for automated analysis, classification and indexing of broadcast news programs. The system first analyses the visual and the speech stream of an input news program in order to obtain an initial partitioning of the program into the so-called report segments. The analysis of the visual stream provides the boundaries of the report segments lying at the beginning and the end of each anchorperson shot. This analysis step is performed by applying an existing techniques for anchorperson shot detection. The analysis of the speech stream gives the boundaries of the report segments lying in the middle of each silent interval. Then, the transcribed speech of each of the report segments is matched with the content of a large pre-specified textual topic database. This database covers a large number of topics and can be updated by the user at any time. Fro each topic a vast number of keywords is given, each of which is also assigned a weight that indicates the importance of a keyword for a certain topic. The result of the matching procedure is a list of probable topics per report segment, where for each topic on the list a likelihood is computed based on the number of relevant keywords found in the segment and on the weights of those keywords. The list of topics per segment is then shortened by separating the most probable from least probable topics based on their likelihood. Finally, the likelihood values of the most probable topics are used in the last system module to merge related neighboring segments into reports. The most probable topics serving as the base for the segment-merging procedure are the same time the retrieval indexes for the reports and are used for classifying together all reports in the database that cover one and the same topic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes the organizational and playback features of Fischlar, a digital video library that allows users o record, browse and watch television programs on- line. Programs that can be watched and recorded are organized by personal recommendations, genre classifications, name and other attributes for access by general television users. Motivations and interactions of users with on-line television libraries are outlined and they are also supported by personalized library access, categorized programs, a combined player browse with content viewing history and content marks. The combined player browser supports a user who watches a program on different occasions in a non-sequential order.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a highlight generation method using contextual information and perception. The proposed method consists of three steps. In the first step, a long video is segmented into shots which are generated by an uninterrupted camera operation. In the second step, the contextual information is computed from video shots. We divide the contextual information into local and global contextual information. We represent the local contextual information in the shot with foreground information, shot activity, and background information. The global contextual information of a shot is represented by shots' interaction and coherency with other shots. Based on the contextual information, the story unit boundaries are detected. For each story unit, we determine meaningful shot candidates by computing shot length, shot activity, contrast value, and foreground object size. Finally, from the candidates, the meaningful shots are selected by applying perceptual grouping rule inversely. By concatenating selected shots, video highlights are generated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The last decade witnessed the improvement of strong compression technics over audio-visual data and the development of world wide communications of information. These innovations gave birth to unforeseen requirements like storing these information, indexing and retrieving them for subsequent usages. Standardization of compression of multimedia contents is rapidly accepted as a need for encoding/decoding audio-visual data regardless of machine and environment. However standardization for indexing these materials still remains a puzzle which disables browsing audio-visual data regardless of machine and environment. MPEG-7 standardization group aims to create the standard syntax to access to multimedia content. This paper puts forward a next step, the extraction of user preferences and matching them with MPEG-7 coded media content for quick and smart browsing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a psycho-visual and analytical framework for automatic measurement of motion activity in view sequences. We construct a test-set of video segments by carefully selecting video segments form the MPEG-7 video test set. We construct a ground truth, based on subjective test with naive subjects. We find that the subjects agree reasonably on the motion activity of video segments, which makes the ground truth reliable. We present a set of automatically extractable, known and novel, descriptors of motion activity based on different hypotheses about subjective perception of motion activity. We show that all the descriptors perform well against the ground truth. We find that the MPEG-7 motion activity descriptor, based on variance of motion vector magnitudes, is one of the best in overall performance over the test set.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper concerns the extraction methodology and usage of MPEG-7 metadata for video transcoding. The idea of the MPEG- 7 descriptors presented within this paper is to give by mens of MPEG-7 metadata Transcoding Hints to a transcoder regarding Motion and Encoding Difficulty. These transcoding hints can be used at the transcoder (1) to preserve the visual quality in terms of PSNR, (2) to modify the GOP structure for efficient storage and retrieval for fast video browsing, while (3) reducing the overall computational complexity significantly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a generic model to describe image and video content by a combination of semantic entities and low level features for semantically meaningful and fast retrieval. The proposed model includes semantic entities such as Object, Event and Actors to express relations between the first two. The use of Actors entity increases the efficiency of certain types of search, while the use of semantic and linguistic roles increases the expression capability of the model. The model also contains links to high-level media segments such as actions and interactions, and low level media segments such as elementary motion and reaction units, as well as low-level features such as motion parameters and trajectories. Based on this model, we propose image and video retrieval combining semantic and low-level information. The retrieval performance of our system is tested by using query-by-annotation, query-by-example, query-by-sketch, and a combination of them.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A lot of experiments have shown that images, graphics and iconographic resources, thanks to their explanatory power, can be considered as a very fundamental component of a man- machine interface. The goal of our approach is to make use of this outstanding property of the images in order to provide a digital library with information discovery capabilities. This paper presents the new MicroNOMAD tool for information discovery in multimedia databases. The core model of the tool is based on an extension of the Kohonen SOM model. Its main characteristic is both to provide a user with emergent analyses of a database content and with querying and browsing guidelines through the use of an advanced topographic interface model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The need to retrieval visual information form large image and video collections is shared by many application domains. This paper describes the main features of Quicklook, a system that combines in a single framework the alphanumeric relational query, the content-based image query exploiting automatically computed low-level image features, and the textural similarity query exploiting any textual attached to image database items.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have the goal of developing computer algorithms for indexing a collection of digitized x-ray images for biomedical features important to researchers in the fields of osteoarthritis and vertebral morphometry. This indexing requires the segmentation of the image contents, identification of relevant anatomy in the segmented images, and classification of the identified anatomy into categories by which the image contents may be indexed. An example of the indexing detail that we have as a goal is, 'disc space narrowing at vertebra location C5-6'. This is a work in progress, with much current activity still in the segmentation step. We approach this segmentation as a hierarchical procedure with the distinctive regions of the image, including the general spine region, first being segmented at a gross level of detail, followed by a finer level segmentation of the spine region into individual vertebrae. In this paper we report on work done toward the gross level segmentation and also describe the image-level characteristics of one of the features targeted for the final indexing step.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an approach to object-based image retrieval for images contain multiple and partially occluded objects. In this approach, contours of objects are used to distinguish different classes of objects in images. We decompose all the contours in an image into segments and compute features from the segments. The C4.5 decision-tree learning algorithm is used to classify each segment in the images. Each image is represented in a k-dimensional space, where k is the number of classes of objects in all the images. Each dimension represents information about one of the classes. Euclidean distance between images in the k- dimensional space is adopted to compute similarities between images based on probabilities of segment classes. Experimental results show that this approach is effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new approach to palmprint retrieval for personal identification. Three key issues in image retrieval are considered - feature selection, similarity measures and dynamic search for the best matching of the sample in the image database. We propose a texture-based method for palmprint feature representation. The concept of texture energy is introduced to define a palm print's global and local features, which are characterized with high convergence of inner-palm similarities and good dispersion of inter-palm discrimination. The search is carried out in a layered fashion: first global features are used to guide the fast selection of a small set of similar candidates from the database from the database and then local features are used to decide the final output within the candidate set. The experimental results demonstrate the effectiveness and accuracy of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content-based image retrieval has become one of the most active research areas in the past few years. Most of the attention from the research has been focused on indexing techniques based on global feature distributions. However, these global distributions have limited discriminating power because they are unable to capture local image information. Applying global Gabor texture features greatly improves the retrieval accuracy. But they are computationally complex. In this paper, we present a wavelet-based salient point extraction algorithm. We show that extracting the color and texture information in the locations given by these points provides significantly improved results in terms of retrieval accuracy, computational complexity and storage space of feature vectors as compared to the global feature approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, content-based image retrieval from a hierarchically organized database (HCBIR) is proposed. Images in the database are categorized into different classes based on human perception. The characteristics of each class is represented by the prototypes extracted from images in the class by using the unsupervised optimal fuzzy clustering algorithm. Based on the proposed image-class matching distance, a modification of the Earth Mover's Distance, the relevant class of the query image can be selected. The rank of candidate images is determined in the descending order of similarity, and a class with the most number of high ranking images is then selected. The search domain is narrowed down and the retrieval efficiency is improved greatly. A comparison is done between HCBIR approach and nonhierarchical CBIR approach. It can be concluded that the HCBIR approach is believed more similar to the process of human vision, and more efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In current content-based image retrieval systems, it is generally accepted that obtaining high-level image features is a key to improve the querying. Among the related techniques, relevance feedback has become a hot research aspect because it combines the information from the user to refine the querying results. In practice, many methods have been proposed to achieve the goal of relevance feedback. In this paper, a new scheme for relevance feedback is proposed. Unlike previous methods for relevance feedback, our scheme provides a self-adaptive operation. First, based on multi- level image content analysis, the relevant images from the user could be automatically analyzed in different levels and the querying could be modified in terms of different analysis results. Secondly, to make it more convenient to the user, the procedure of relevance feedback could be led with memory or without memory. To test the performance of the proposed method, a practical semantic-based image retrieval system has been established, and the querying results gained by our self-adaptive relevance feedback are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the increasing demand and offer of the technology, the next generation of the image file formats will be more likely to store and retrieve images based on their semantic conte. Thus, an image should be segmented into 'meaningful' regions, each of which corresponds to an object and/or background. In this study, we propose a scheme for multi- level image segmentation, based on a simple descriptor, called 'the closest color in the same neighborhood'. The proposed scheme generates a stack of images without using any segmentation threshold. The stack of images is hierarchically ordered in a uniformity tree. The uniformity tree is then associated with a semantic tree, which is built by the user for content based representation. The experiments indicate superior results for retrieving images, which consist of few objects and a background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color features are reviewed and their effectiveness assessed in the application framework of key-frame clustering for abstracting unconstrained video. Existing color spaces and associated quantization schemes are first studied. Description of global color distribution by means of histograms is then detailed. In our work, twelve combinations of color space and quantization were selected, together with twelve histogram metrics. Their respective effectiveness with respect to picture similarity measurement was evaluated through a query-be-example scenario. For that purpose, a set of still-picture databases was built by extracting key-frames from several video clips, including news, documentaries, sports and cartoons. Classical retrieval performance evaluation criteria were adapted to the specificity of our testing methodology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article a research work in the field of content- based image retrieval in large database applied to the Paleontology image database of the universite de Bourgogne, Dijon, France called 'TRANS TYFIPAL' is proposed. Our indexing method is based on multiresolution decomposition of database images using wavelets. For each kind of paleontology images we try to find a characteristic image representing it. This model image is computed using a classification algorithm on the space of parameters extracted from the wavelet transform of each image. Then a search tree is built to offer users a graphic interface for retrieving images. So that users have to navigate through this tree to find an image similar to that of their request. Our contribution in the field is the building of the model and of the search tree to make user access easier and faster. This paper ends with a conclusion on first coming results and a description of future work to be done to enhance our indexing and retrieval method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color histogram is widely used in image retrieval due to its simplicity and fast operation sped. Since color histogram describes only global color distribution in an image, it is not robust to large changes in appearance and shape caused by viewing position, camera zoom, etc. To overcome this problem, we propose the method using the color edge information. Although changes in appearance and shape happen, the pair of colors on the color edge does not change. So we use the global distribution of pairs of colors on the color edge pixel to cope with large appearance change. In the proposed method, color edge detection based on vector angle is performed to classify the pixels of image into smooth and edge pixels. For edge pixel, the global distribution of pairs of colors around the edge is represented by 36 non-uniform colors. In the smooth compressed by DCT. Joint histogram of compressed the 2D chromaticity histogram and the global distribution of pairs of colors is very robust to large appearance changes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most content-based image retrieval techniques are not able to eliminate noise from similarity matching since they capture the features of the entire image area or pre- perceived objects at the database build time. Recent approaches address this outstanding issue by allowing users to arbitrarily exclude noise in formulating their queries. This capability has resulted in high retrieval effectiveness for a wide range of queries. However, implementing these techniques for large image collections presents a great challenge since we can not assume any shape for queries defined by users. In this paper, we propose an efficient indexing/retrieval technique for arbitrarily-shaped queries which is able to eliminate a majority of unqualified images. Moreover, we improve the retrieval process with a filtering phase to prune out additional false matches before the detailed similarity measure is carried out. We have implemented the proposed technique in our image retrieval system for a large image collection. Our experimental results show that our technique is capable of handling image matching very well and 70 times on average faster than the straightforward sequential scanning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are a number of shots in a video, each of which has boundary types, such as cut, fade, dissolve and wipe. Many previous approaches can find the cut boundary without difficulty. However, most of them often produce false alarms for the videos with large motions of camera and objects. We propose a shot boundary detection method combining Bayesian and structural information. In the Bayesian approach, a probability distribution function models each transition type, e.g., normal, abrupt, gradual transition, and also models shot length. But inseparability between those distributions causes unwanted results and degrades the precision. In this paper, we demonstrate that the shape of the filtered frame difference, called the structural information, provides an important cue to distinguish fade and dissolve effects form cut effects and gradual changes caused by motion of camera and objects. The proposed method has been tested for a few golf video segments and shown good performances in detecting fade and dissolve effects as well as cut.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a technique for video summarization that uses motion descriptors computed in the compressed domain to speed up conventional color based video summarization technique. The basic hypothesis of the work is that the intensity of motion activity of a video segment is a direct indication of its 'summarizability.' We present experimental verification of this hypothesis. We are thus able to quickly identify easy to summarize segments of a video sequence since they have a low intensity of motion activity. Moreover, the compressed domain extraction of motion activity intensity is much simpler than the color-based calculations. We are able to easily summarize these segments by simply choosing a key-frame at random from each low- activity segment. We can then apply conventional color-based summarization techniques to the remaining segments. We are thus able to speed up color-based summarization techniques by reducing the number of segments on which computationally more expensive color-based computation is needed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an integrated system for supporting content-based video retrieval and browsing over networks. An automatic semantic video object extraction technique for providing more compact video representation is developed. The video images are first partitioned into a ste of homogeneous regions with accurate boundaries by integrating the result of color edge detection and region growing procedures. The object seeds, which are the intuitive and representative part of the semantic objects, are detected from these obtained homogeneous image regions. The semantic objects are then generated by a seeded region aggregation or a human interaction procedure. These obtained semantic objects are tracked along the time axis for exploiting their temporal correspondences among frames. Given the semantic video objects represented by a set of visual features, a seeded semantic video content clustering technique is developed for providing more effective video indexing, retrieval and browsing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A model-based anchorperson detection algorithm is discussed for automatic news video indexing. Anchorperson pattens are modeled with respect to their color and shape features and the detection problem is decomposed into a color-based face region detection and a model-based head-and-shoulder shape detection problem. Experimental results are reported on actual NBC nightly news with high detection rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Our goal is to enable queries about the motion of objects in a video sequence. Tracking objects in video is a difficult task, involving signal analysis, estimation and often semantic information particular to the targets. That is not our focus-rather, we assume that tracking is done, and turn to the task of representing the motion for query. The position over time of an object result in a motion trajectory, i.e., a sequence of locations. We propose a novel representation of trajectories: we use the path and speed curves as the motion representation. The path curve records the position of the object while the speed curve records the magnitude of its velocity. This separates positional information from temporal information, since position may be more important in specifying a trajectory than the actual velocity of a trajectory. Velocity can be recovered from our representation. We derive a local geometric description of the curves invariant under scaling and rigid motion. We adopt a warping method in matching so that it is roust to variation in feature vectors. We show that R-trees can be used to index the multidimensional features so that search will be efficient and scalable to a large database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, a huge amount of the video data available in the digital form has given users to allow more ubiquitous access to visual information than ever. To efficiently manage such huge amount of video data, we need such tools as video summarization and search. In this paper, we propose a novel scheme allowing for both scalable hierarchical video summary and efficient retrieval by introducing a notion of fidelity. The notion of fidelity in the tree-structured key frame hierarchy describes how well the key frames at one level are represented by the parent key frame, relative to the other children of the parent. The experimental results demonstrate the feasibility of our scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content-based video retrieval system is one of the important design issues of multimedia, mainly depending on its visual and spatio-temporal characteristic. But until now, well- defined model for video retrieval is still at rudimentary stage. We propose a unified video retrieval model to simulate human perception. Given an arbitrary video, considering all the factors existing in human vision perception, we can find similar ones from large video repository within time limitation. This kind of measurement simulates the rules in human being's judgement, so it can be close to the real need. Furthermore, integrating with feedback, the results can be adjusted according to user's preference. This learning strategy can emphasize the aspect user cares about, and embodies int in the next iteration of similarity computing. In this way, retrieval results can be optimized greatly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe a framework of analyzing programs belonging to different TV program genres Hidden Markov Models and pseudo-semantic feature s derived from video shots. Clustering using Gaussian mixture models is used to determine the order of the modes. Results for initial genre classification experiments using two simple features derived from video shots are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As current disk space and transfer speed increase, the bandwidth between a server and its disks has become critical for video-on-demand (VOD) services. Our VOD server consists of several hosts sharing data on disks through a ring-based network. Data sharing provided by the spatial-reuse ring network between servers and disks not only increases the utilization towards full bandwidth but also improves the availability of videos. Striping and replication methods are introduced in order to improve the efficiency of our VOD server system as well as the availability of videos. We consider tow kinds of resources of a VOD server system. Given a representative access profile, our intention is to propose an algorithm to find an initial condition, place videos on disks in the system successfully. If any copy of a video cannot be placed due to lack of resources, more servers/disks are added. When all videos are place on the disks by our algorithm, the final configuration is determined with indicator of how tolerable it is against the fluctuation in demand of videos. Considering it is a NP-hard problem, our algorithm generates the final configuration with O(M log M) at best, where M is the number of movies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By introducing the feature of tree-structured wavelet transform, a novel texture image retrieval method is proposed in this paper. The method can produce eigenfeature at different scales precisely by decomposing the texture at multi-scales and multi-directions adaptively under the energy rule. In terms of these image eigenvalues, the method also suggested a modified algorithm, named Principal Eigenvalues Analysis (PEA), which can cut down eigenfeature dimensions effectively. It was confirmed that on the capability of the hierarchical way provided by this method the use-oriented application processing can allow users to carry out different retrieval on accord to users' requirements, which is called a coarse to fine retrieval. It was indicated by experimental results that the modified texture retrieval way have powerful practical merits for it can improve the retrieval accuracy efficiently and speed up the retrieval processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Today the consumers are facing an ever-increasing amount of television programs. The problem, however, is that the content of video programs is opaque. The existing video watching options for consumers are either to watch the whole video, fast forward to try and find the relevant portion, or to use electronic program guides to get additional information. In this paper we will present a summarization system for processing incoming video, extracting and analyzing closed caption text, determining the boundaries of program segments as well as commercial breaks and extracting a program summary from a complete broadcast to enable video transparency. The system consists of: transcript extractor, program type classifier, cue extractor, knowledge database, temporal database, inference engine, and summerizer. The main topics that will be discussed are video summary, video categorization and retrieval tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The recognition of objects that appear in a video sequence is an essential aspect of any video content analysis system. We present an approach which classifies a segmented video object base don its appearance in successive video frames. The classification is performed by matching curvature features of the contours of these object views to a database containing preprocessed views of prototypical objects using a modified curvature scale space technique. By integrating the result of an umber of successive frames and by using the modified curvature scale space technique as an efficient representation of object contours, our approach enables the robust, tolerant and rapid object classification of video objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.