PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.1 For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in2:
- data sets
- query tasks
- ground truth
- evaluation measures
- benchmarking events.
This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
VIMA has experienced an increasing demand for Content-based Image Retrieval (CBIR) systems since late 2004. In this paper, we report the search, filtering, and annotation systems that we have developed and deployed, and the user models of these systems. The objective of this paper is to provide to the researchers and developers in the area of image retrieval, guidelines for measuring the performance of their algorithms/systems, in a way that is consonant with the requirements of the users. We also enumerate technical challenges of building CBIR systems, and outline our solutions to tackle these challenges.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image retrieval is a human-centered task: images are created by people and are ultimately accessed and used by people for human-related activities. In designing image retrieval systems and algorithms, or measuring their performance, it is therefore imperative to consider the conditions that surround both the indexing of image content and the retrieval. This includes examining the different levels of interpretation for retrieval, possible search strategies, and image uses. Furthermore, we must consider different levels of similarity and the role of human factors such as culture, memory, and personal context. This paper takes a human-centered perspective in outlining levels of description, types of users, search strategies, image uses, and human factors that affect the construction and evaluation of automatic content-based retrieval systems, such as human memory, context, and subjectivity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many image retrieval systems, and the evaluation methodologies of these systems, make use of either visual or textual information only. Only few combine textual and visual features for retrieval and evaluation. If text is used, it is often relies upon having a standardised and complete annotation schema for the entire collection. This, in combination with high-level semantic queries, makes visual/textual combinations almost useless as the information need can often be solved using just textual features. In reality, many collections do have some form of annotation but this is often heterogeneous and incomplete. Web-based image repositories such as FlickR even allow collective, as well as multilingual annotation of multimedia objects. This article describes an image retrieval evaluation campaign called ImageCLEF. Unlike previous evaluations, we offer a range of realistic tasks and image collections in which combining text and visual features is likely to obtain the best results. In particular, we offer a medical retrieval task which models exactly the situation of heterogenous annotation by combining four collections with annotations of varying quality, structure, extent and language. Two collections have an annotation per case and not per image, which is normal in the medical domain, making it difficult to relate parts of the accompanying text to corresponding images. This is also typical of image retrieval from the web in which adjacent text does not always describe an image. The ImageCLEF benchmark shows the need for realistic and standardised datasets, search tasks and ground truths for visual information retrieval evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Constructing a benchmark for content-based image retrieval (CBIR) applications is an important task because researchers in this area highly depend on experiments to compare different systems. Image collection, concept annotation and performance evaluation are the three main issues that should be considered carefully. Based on our previous work and experiments on both Corel image collection and TRECVID dataset, we present some basic principles of constructing a benchmark for CBIR applications. According to our experience in the collaborative annotation of TRECVID 2005 data, we propose a hierarchical concept annotation strategy to produce ground truth for the CBIR benchmark image collection. To address the conflicts among collaborative annotations from multiple annotators, we present a fuzzy annotation method, in which a membership function is defined to indicate the probability that an image contains a given concept. Evaluation criteria corresponding to the fuzzy annotation method are also presented so as to give a more reasonable evaluation of performance for different CBIR applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the issues in Web page design is the selection of appropriate combinations of background and foreground colors to display textual information. Colors have to be selected in order to guarantee legibility for different devices, viewing conditions and, more important, for all the users, including those with deficient color vision. In this paper we present a tool to select background and foreground colors for the display of textual information. The tool is based on the Munsell Book of Colors; it allows the browsing of the atlas and indicates plausible colors based on a set of legibility rules, which have been defined experimentally.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The availability of large audio collections calls for ways to efficiently access and explore them by providing an effective overview of their contents at the interface level. In this paper we present an innovative strategy exploiting color to visualize the content of a database of audio records, part of a website dedicated to ethnographic information in a region of Italy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we explore leveraging industry-standard media formats to effectively deliver interactive, 3D scientific visualization to a remote viewer. Our work is motivated by the need for remote visualization of time-varying, 3D data produced by scientific simulations or experiments while taking several practical factors into account, including: maximizing ease of use from the user's perspective, maximizing reuse of image frames, and taking advantage of existing software infrastructure wherever possible. Visualization or graphics applications first generate images at some number of view orientations for 3D scenes and temporal locations for time-varying scenes. We then encode the resulting imagery into one of two industry-standard formats: QuickTime VR Object Movies or a combination of HTML and JavaScript code implementing the client-side navigator. Using an industry-standard QuickTime player or web browser, remote users may freely navigate through the pre-rendered images of time-varying, 3D visualization output. Since the only inputs consist of image data, a viewpoint and time stamps, our approach is generally applicable to all visualization and graphics rendering applications capable of generating image files in an ordered fashion. Our design is a form of latency-tolerant remote visualization infrastructure where processing time for visualization, rendering and content delivery is effectively decoupled from interactive exploration. Our approach trades off increased interactivity, reduced load and effective reuse of coherent frames between multiple users (from the server's perspective) at the expense of unconstrained exploration. This paper presents the system architecture along with an analysis and discussion of its strengths and limitations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In our effort to contribute to the closing of the "semantic gap" between images and their semantic description, we are building a large-scale ontology of images of objects. This visual catalog will contain a large number of images of objects, structured in a hierarchical catalog, allowing image processing researchers to derive signatures for wide classes of objects. We are building this ontology using images found on the web. We describe in this article our initial approach for finding coherent sets of object images. We first perform two semantic filtering steps: the first involves deciding which words correspond to objects and using these words to access databases which index text found associated with an image (e.g. Google Image search) to find a set of candidate images; the second semantic filtering step involves using face recognition technology to remove images of people from the candidate set (we have found that often requests for objects return images of people). After these two steps, we have a cleaner set of candidate images for each object. We then index and cluster the remaining images using our system VIKA (VIsual KAtaloguer) to find coherent sets of objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In scientific communities images play a dominant role to convey a message but more important, as a tool for experimental output. The meaning of these images develops from annotation that is provided by the researchers. Annotation can be accomplished in a number of ways. In this paper we describe graphical and textual annotations that are developed from ontologies. The Internet has provided the research community a medium for exchange of images. Images are, however, not straightforwardly suitable for exchange. Knowledge about what is depicted in the image as well as specific image content is important for image understanding. This holds in particular for scientific images that are the result of experimentation. For the purpose of image exchange, that is query-based search, image retrieval mechanisms based on pixel content as well as semantics are developed. In the field of experimental imaging new paradigms will have to be developed so that a search query results in correct image collections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The choice of a colour space is of great importance for many computer vision algorithms (e.g. edge detection and object recognition). It induces the equivalence classes to the actual algorithms. Since there are many colour spaces available, the problem is how to automatically select the weighting to integrate the colour spaces in order to produce the best result for a particular task. In this paper we propose a method to learn these weights, while exploiting the non-perfect correlation between colour spaces of features through the principle of diversification. As a result an optimal trade-off is achieved between repeatability and distinctiveness. The resulting weighting scheme will ensure maximal feature discrimination.
The method is experimentally verified for three feature detection tasks: Skin colour detection, edge detection and corner detection. In all three tasks the method achieved an optimal trade-off between (colour) invariance (repeatability) and discriminative power (distinctiveness).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a new approach to the automatic detection of human faces and places depicted in photographs taken on cameraphones. Cameraphones offer a unique opportunity to pursue new approaches to media analysis and management: namely to combine the analysis of automatically gathered contextual metadata with media content analysis to fundamentally improve image content recognition and retrieval. Current approaches to content-based image analysis are not sufficient to enable retrieval of cameraphone photos by high-level semantic concepts, such as who is in the photo or what the photo is actually depicting. In this paper, new methods for determining image similarity are combined with analysis of automatically acquired contextual metadata to substantially improve the performance of face and place recognition algorithms. For faces, we apply Sparse-Factor Analysis (SFA) to both the automatically captured contextual metadata and the results of PCA (Principal Components Analysis) of the photo content to achieve a 60% face recognition accuracy of people depicted in our database of photos, which is 40% better than media analysis alone. For location, grouping visually similar photos using a model of Cognitive Visual Attention (CVA) in conjunction with contextual metadata analysis yields a significant improvement over color histogram and CVA methods alone. We achieve an improvement in location retrieval precision from 30% precision for color histogram and CVA image analysis, to 55% precision using contextual metadata alone, to 67% precision achieved by combining contextual metadata with CVA image analysis. The combination of context and content analysis produces results that can indicate the faces and places depicted in cameraphone photos significantly better than image analysis or context analysis alone. We believe these results indicate the possibilities of a new context-aware paradigm for image analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The segmentation of skin regions in color images is a preliminary step in several applications. Many different methods for discriminating between skin and non-skin pixels are available in the literature. The simplest, and often applied, methods build what is called an "explicit skin cluster" classifier which expressly defines the boundaries of the skin cluster in certain color spaces. These binary methods are very popular as they are easy to implement and do not require a training phase. The main difficulty in achieving high skin recognition rates, and producing the smallest possible number of false positive pixels, is that of defining accurate cluster boundaries through simple, often heuristically chosen, decision rules. In this study we apply a genetic algorithm to determine the boundaries of the skin clusters in multiple color spaces. To quantify the performance of these skin detection methods, we use recall and precision scores. A good classifier should provide both high recall and high precision, but generally, as recall increases, precision decreases. Consequently, we adopt a weighted mean of precision and recall as the fitness function of the genetic algorithm. Keeping in mind that different applications may have sharply different requirements, the weighting coefficients can be chosen to favor either high recall or high precision, or to satisfy a reasonable tradeoff between the two, depending on application demands. To train the genetic algorithm (GA) and test the performance of the classifiers applying the GA suggested boundaries, we use the large and heterogeneous Compaq skin database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we describe a proposal for multimedia and e-learning content description based on standards interoperability within a digital library environment integrated in a virtual campus. In any virtual e-learning environment, a complex scenario which usually includes a digital library or, at least, a repository of learning resources, different levels of description are needed for all the elements: learning resources, multimedia content, activities, roles, etc. These elements can be described using library, e-learning and multimedia standards, depending on the specific needs of each particular scenario of use, but this might lead to an undesirable duplication of metadata, and to inefficient content queries and maintenance. Furthermore, there is a lack of semantic descriptions which makes all these contents merely become digital objects in the digital library, without exploiting all the possibilities in a e-learning virtual environment. Due to its flexibility and completeness, we propose to use the MPEG-7 standard for describing all the learning resources in the digital library, combined with the use of an ontology for a formal description of the learning process. The equivalences of Dublin Core, LOM and MPEG-7 standards are outlined, and the requirements of a proposal for a MPEG-7 based representation for all the contents in the digital library and the virtual classroom are described. The intellectual property policies for content sharing both within and among organizations are also addressed. With such proposal, it would be possible to build complex multimedia courses from a repository of learning objects using the digital library as the core repository.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of this paper is to investigate the selection of the kernel for a Web-based AIRS. Using the Kernel Perceptron learning method, several kernels having polynomial and Gaussian Radial Basis Function (RBF) like forms (6 polynomials and 6 RBFs) are applied to general images represented by color histograms in RGB and HSV color spaces. Experimental results on these collections show that performance varies significantly between different kernel types and that choosing an appropriate kernel is important.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many evaluation techniques for content based image retrieval are based on the availability of a ground truth, that is on a "correct" categorization of images so that, say, if the query image is of category A, only the returned images in category A will be considered as "hits." Based on such a ground truth, standard information retrieval measures such as precision and recall and given and used to evaluate and compare retrieval algorithms. Coherently, the assemblers of benchmarking data bases go to a certain length to have their images categorized. The assumption of the existence of a ground truth is, in many respect, naive. It is well known that the categorization of the images depends on the a priori (from the point of view of such categorization) subdivision of the semantic field in which the images are placed (a trivial observation: a plant subdivision for a botanist is very different from that for a layperson). Even within a given semantic field, however, categorization by human subjects is subject to uncertainty, and it makes little statistical sense to consider the categorization given by one person as the unassailable ground truth. In this paper I propose two evaluation techniques that apply to the case in which the ground truth is subject to uncertainty. In this case, obviously, measures such as precision and recall as well will be subject to uncertainty. The paper will explore the relation between the uncertainty in the ground truth and that in the most commonly used evaluation measures, so that the measurements done on a given system can preserve statistical significance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As found in the literature, most Internet-based prototype Content-Based Image Retrieval (CBIR) systems focus on stock photo collections and do not address challenges of large specialized image collections and topics such as medical information retrieval by image content. Even fewer have medically validated data to evaluate retrieval quality in terms of precision and relevance. To date, our research has reported over 75% relevant spine X-ray image retrieval tested on 888 validated vertebral shapes from 207 images using our prototype CBIR system operating within our local network. As a next step, we have designed and developed an Internet-based medical validation tool and a CBIR retrieval tool in MATLAB and JAVA that can remotely connect to our database. The retrieval tool supports hybrid text and image queries and also provides partial shape annotation for pathology-specific querying. These tools are initially developed for domain experts, such as radiologists and educators, to identify design issues for improved workflow. This article describes the tools and design considerations in their development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A corporate information system needs to be as accessible as library content, which implies to organize the content in a logical structure, categorizing it, and using the categories to add metadata
to the information. Content Management System (CMS) are an emerging kind software component that manages content, usually making a large use of the web technologies, whose main goals are to allow easy creation, publishing and retrieval of content to fit business needs. The focus of this paper is to describe how we integrated "map" metaphor into a CMS. Where maps are symbolic information and rely on the use of a graphic sign language. A characteristic feature of maps is that their design has traditionally been constrained by the need to create one model of reality for a variety of purposes. The map's primary role as a communication medium involves the application of processes such as selection, classification, displacement, symbolization and graphic exaggeration. A model of the infrastructure is presented and the current prototype of the model is briefly discussed together the currently deployed environment for the cultural heritage information dissemination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present an automatic enhanced video display and navigation capability for networked streaming video and networked video playlists. Our proposed method uses Synchronized Multimedia Integration Language (SMIL) as presentation language and Real Time Streaming Protocol (RTSP) as network remote control protocol to automatically generate a "enhanced video strip" display for easy navigation. We propose and describe two approaches - a smart client approach and a smart server approach. We also describe a prototype system implementation of our proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a technique to display real-time 3-D images captured by web cameras on the stereoscopic display of a personal computer (PC) using screen pixel access. Images captured by two side-by-side web cameras are sent through the Internet to a PC and displayed in two conventional viewers for moving images. These processes are carried out independently for the two cameras. The image data displayed in the viewer are in the video memory of the PC. Our method uses this video-memory data, i.e., the two web-camera images are read from the video memory, they are composed as a 3-D image, and then it is written in the video memory again. A 3-D image can be seen if the PC being used has a 3-D display. We developed an experimental system to evaluate the feasibility of this technique. The web cameras captured up to 640 × 480 pixels of an image, compressed it with motion JPEG, and then sent it over a LAN. Using an experimental system, we evaluated that the 3-D image had almost the same quality as a conventional TV image by using a broadband network like ADSL.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector graphics is increasingly gaining importance within the Word Wide Web community, because it allows users to create images that are easily manageable, modifiable and understandable. Two formats play a leading role among the languages for vector graphics: SVG and VML. Achieving a complete interoperability between these two languages means providing users a complete support for vector images across implementations, operating systems and media. In this paper we describe VectorConverter, a tool that allows easy, automatic and reasonably good conversion between two vector graphic formats, SVG and VML, and one raster format, GIF. This tool makes good translations between languages with very different functionalities and expressivity, by applying translation rules, approximation and heuristics. A high-level discussion about implementation details, open issues and future developments of VectorConverter is provided as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a technique to convert surfaces, obtained through a Data Dependent Triangulation, in Bezier Curves by using a Scalable Vector Graphics File format. The method starts from a Data Dependent Triangulation, traces a map of the boundaries present into the triangulation, using the characteristics of the triangles, then the estimated barycenters are connected, and a final conversion of the resulting polylines in curves is performed. After the curves have been estimated and closed the final representation is obtained by sorting the surfaces in a decreasing order. The proposed techniques have been compared with other raster to vector conversions in terms of perceptual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a system that automatically tracks moving objects in a scene and subjectively characterizes the object trajectories for storage and retrieval. A multi-target color-histogram particle filter combined with besthypothesis data association is the foundation of our trajectory acquisition algorithm. To improve computational performance, we use quasi-Monte-Carlo methods to reduce the number of particles required by each filter. The tracking system operates in real-time to produce a stream of XML documents that contain the object trajectories. To characterize trajectories subjectively, we form a set of shape templates that describes basic maneuvers (e.g., gentle turn right, hard turn left, straight line). Procrustes shape analysis provides a scaleand rotation-invariant mechanism to identify occurrences of these maneuvers within a trajectory. To add spatial information to our trajectory representation, we partition the two-dimensional space under surveillance into a set of mutually exclusive regions. A temporal sequence of region-to-region transitions gives a spatial representation of the trajectory. The shape and position descriptions combine to form a compact, high-level representation of a trajectory. We provide similarity measures for the shape, position, and combined shape and position representations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an archiving method of broadcasts for TV terminals including a set-top box (STB) and a personal video recorder (PVR). Our goal is to effectively cluster and retrieve semantic video scenes obtained by realtime broadcasting content filtering for re-use or transmission. For TV terminals, we generate new video archiving formats which combine broadcasting media resources with the related metadata and auxiliary media data. In addition, we implement an archiving system to decode and retrieve the media resource and the metadata within the format. The experiment shows that the proposed format makes it possible to retrieve or browse media data or metadata in the TV terminal effectively, and could have compatibility with a portable device.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present AVIR (Audio & Video Information Retrieval), a project of CNR (Italian National Research Council) - ITC to develop a tools to support an information system for distance e-learning. AVIR has been designed to store, index, and classify audio and video lessons to make them available to students and other interested users. The core of AVIR is a SDR (Spoken Document Retrieval) system which automatically transcribes the spoken documents into texts and indexes them through dictionaries appropriately created. During the fruition on-line, the user can formulate his queries searching documents by date, professor, title of the lesson or selecting one or more specific words. The results are presented to the users: in case of video lessons the preview of the first frames is shown. Moreover, slides of the lessons and associate papers can be retrieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Our research showed that a high degree of life-stress has a negative mental health effect that may interrupt regular exercise. We used an internet based, remotely conducted, face to face, preventive counseling program using video monitors to reduce the source of life-stresses that interrupts regular exercise and evaluated the preventative effects of the program in elderly people. NTSC Video signals were converted to the IP protocol and facial images were transmitted to a PC display using the exclusive optical network lines of JGN2. Participants were 22 elderly people in Hokkaido, Japan, who regularly played table tennis. A survey was conducted before the intervention in August 2003. IT remote counseling was conducted on two occasions for one hour on each occasion. A post intervention survey was conducted in February 2004 and a follow-up survey was conducted in March 2005. Network quality was satisfactory with little data loss and high display quality. Results indicated that self-esteem increased significantly, trait anxiety decreased significantly, cognition of emotional support by people other than family members had a tendency to increase, and source of stress had a tendency to decrease after the intervention. Follow-up results indicated that cognition of emotional support by family increased significantly, and interpersonal dependency decreased significantly compared to before the intervention. These results suggest that face to face IT remote counseling using video monitors is useful to keep elderly people from feeling anxious and to make them confident to continue exercising regularly. Moreover, it has a stress management effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In MPEG-4, 3D mesh coding (3DMC) achieves 40:1 to 50:1 compression ratio over 3-D meshes (in VRML IndexedFaceSet representation) without noticeable visual degradation. This substantial gain comes not for free: it changes the vertex and face permutation order of the original 3-D mesh model. This vertex and face permutation order change may cause a serious problem for animation, editing operation, and special effects, where the original permutation order information is critical not only to the mesh representation, but also to the related tools. To fix this problem, we need to transmit the vertex and face permutation order information additionally. This additional transmission causes the unexpected increase of the bitstream size. In this paper, we proposed a novel vertex and face permutation order compression algorithm to address the vertex and face permutation order change by the 3DMC encoding with the minimal increase of side information. Our proposed vertex and face permutation order coding method is based on the adaptive probability model, which makes allocating one fewer bits codeword to each vertex and face permutation order in every distinguishable unit as encoding proceeds. Additionally to the adaptive probability model, we further increased the coding efficiency of the proposed method by representing and encoding each vertex and face permutation order per connected component (CC). Simulation results demonstrated that the proposed algorithm can encode the vertex and face permutation order losslessly while making up to 12% bit-saving compared with the logarithmic representation based on the fixed probability model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents FaceLab, an innovative, open environment created to evaluate the performance of face recognition strategies. It simplifies, through an easy-to-use graphical interface, the basic steps involved in testing procedures such as data organization and preprocessing, definition and management of training and test sets, definition and execution of recognition strategies and automatic computation of performance measures. The user can extend the environment to include new algorithms, allowing the definition of innovative recognition strategies. The performance of these strategies can be automatically evaluated and compared by the tool, which computes several performance measures for both identity verification and identification scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.