PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
We assume that the goal of content based image retrieval is to find images which are both semantically and visually relevant to users based on image descriptors. These descriptors are often provided by an example image--the query by example paradigm. In this work we develop a very simple method for evaluating such systems based on large collections of images with associated text. Examples of such collections include the Corel image collection, annotated museum collections, news photos with captions, and web images with associated text based on heuristic reasoning on the structure of typical web pages (such as used by Google(tm)). The advantage of using such data is that it is plentiful, and the method we propose can be automatically applied to hundreds of thousands of queries. However, it is critical that such a method be verified against human usage, and to do this we evaluate over 6000 query/result pairs. Our results strongly suggest that at least in the case of the Corel image collection, the automated measure is a good proxy for human evaluation. Importantly, our human evaluation data can be reused for the evaluation of any content based image retrieval system and/or the verification of additional proxy measures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content based image retrieval (CBIR) as it is known today has to deal with a number of challenges. Quickly summarized, the main challenges are firstly, to bridge the semantic gap between high-level concepts and low-level features using feedback, secondly to provide performance under adverse conditions. High-dimensional spaces, as well as a demanding machine learning task make the right way of indexing an important issue.
When indexing multimedia data, most groups opt for extraction of high-dimensional feature vectors from the data, followed by dimensionality reduction like PCA (Principal Components Analysis) or LSI (Latent Semantic Indexing). The resulting vectors are indexed using spatial indexing structures such as kd-trees or R-trees, for example.
Other projects, such as MARS and Viper propose the adaptation of text indexing techniques, notably the inverted file. Here, the Viper system is the most direct adaptation of text retrieval techniques to quantized vectors. However, while the Viper query engine provides decent performance together with impressive user-feedback behavior, as well as the possibility for easy integration of long-term learning algorithms, and support for potentially infinite feature vectors, there has been no comparison of vector-based methods and inverted-file-based methods under similar conditions.
In this publication, we compare a CBIR query engine that uses inverted files (Bothrops, a rewrite of the Viper query engine based on a relational database), and a CBIR query engine based on LSD (Local Split Decision) trees for spatial indexing using the same feature sets.
The Benchathlon initiative works on providing a set of images and ground truth for simulating image queries by example and corresponding user feedback. When performing the Benchathlon benchmark on a CBIR system (the System Under Test, SUT), a benchmarking harness connects over internet to the SUT, performing a number of queries using an agreed-upon protocol, the multimedia retrieval markup language (MRML). Using this benchmark one can measure the quality of retrieval, as well as the overall (speed) performance of the benchmarked system.
Our Benchmarks will draw on the Benchathlon’s work for documenting the retrieval performance of both inverted file-based and LSD tree based techniques. However in addition to these results, we will present statistics, that can be obtained only inside the system under test. These statistics will include the number of complex mathematical operations, as well as the amount of data that has to be read from disk during operation of a query.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present "PanoramaSeek", a new video streaming system suitable for Internet video retrieval. The system features intelligent streaming, which enables users to take quick glances at arbitrary portions of a video sequence by using slide bars. The streaming system utilizes multi-level interactive streaming and content-based selection of key frames. Multi-level interactive streaming supports three levels of display on a video window: key frame display, normal playback, and specific frame display. Users can switch between levels at any time. Content-based selection of key frames defines key frames by analyzing the changes occurring in video sequences and selecting appropriate, representative frames. Our evaluation of the system demonstrated that multi-level interactive streaming is effective in providing users with random browsing capability, and that content-based selection of key frames improves the efficiency of video retrieval by reducing the number of frames users have to check.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a novel digital watermarking technique for still image authentication. A model of the Human Visual System is exploited not only to enhance the visual quality of the watermarked images, but also to enhance the receiver’s probability of extracting the watermark correctly. Our goal is to jointly optimize rate, robustness and visibility, so that detailed features can be embedded, content-preserving manipulations are tolerated and the authenticated image is not distorted too severely by the embedding process. The algorithm performance has been assessed through numerical simulations and comparisons with the current state of the art. Results prove significant improvements in robustness against perceptually lossless modifications (such as mild JPEG and JPEG2000 compression).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new image compression format is being developed to replace the gif and tif formats for Internet image transmission. Based on the Autosophy information theory, the new format is especially suited to the Internet. Features include much higher compression ratios, improved resistance to the Internet's Quality of Service (QoS) problems, a universal hardware-independent communication format, and optional codebook encryption for secure communications. The all 16 bit data format allows the mixing of other Internet data types (including the live video, sound, text, and random bit files) within a universal communication protocol. Encoding speed is less than a second per image. Hardware chipsets are necessary for real-time encoding speed, but real-time retrieval can be achieved using software only. Conventional lossless image compression formats use the dynamically growing tree libraries of the LZW code. That yields minimal compression for small images and creates great sensitivity to transmission errors. The new image compression format, in contrast, uses a fixed pre-grown hyperspace pattern library. That provides much higher compression ratios, increased error resistance, and even optional encryption. The same algorithms can provide either lossless compression according to the Shannon theory or visually lossless compression according to the Autosophy theory.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a new networked telepresence system which realizes virtual tours into a visualized dynamic real world without significant time delay. Our system is realized by the following three steps: (1) video-rate omnidirectional image acquisition, (2) transportation of an omnidirectional video stream via internet, and (3) real-time view-dependent perspective image generation from the omnidirectional video stream. Our system is applicable to real-time telepresence in the situation where the real world to be seen is far from an observation site, because the time delay from the change of user’s viewing direction to the change of displayed image is small and does not depend on the actual distance between both sites. Moreover, multiple users can look around from a single viewpoint in a visualized dynamic real world in different directions at the same time. In experiments, we have proved that the proposed system is useful for internet telepresence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper deals with the proposition of a model for human motion analysis in a video. Its main caracteristic is to adapt itself automatically to the current resolution, the actual quality of the picture, or the level of precision required by a given application, due to its possible decomposition into several hierarchical levels. The model is region-based to address some analysis processing needs. The top level of the model is only defined with 5 ribbons, which can be cut into sub-ribbons regarding to a given (or an expected) level of details. Matching process between model and current picture consists in the comparison of extracted subject shape with a graphical rendering of the model built on the base of some computed parameters. The comparison is processed by using a chamfer matching algorithm. In our developments, we intend to realize a platform of interaction between a dancer and tools synthetizing abstract motion pictures and music in the conditions of a real-time dialogue between a human and a computer. In consequence, we use this model in a perspective of motion description instead of motion recognition: no a priori gestures are supposed to be recognized as far as no a priori
application is specially targeted. The resulting description will be made following a Description Scheme compliant with the movement notation called "Labanotation".
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents a web-based application for preparation and presentation of various two and three dimensional cultural showpieces in a virtual environment. Specific task modules built on a common database provide tools for designing spatial models of a real or a fully virtual gallery, exhibit management, arrangement of exhibits within the virtual space, and final web presentation using standard VRML browser and Java applet. The whole application serves for different kinds of users gallery owners, artists, and visitors. A use of virtual reality paradigms for image presentation purposes is discussed here, too.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an Internet based remote experimental setup of a double lined pendulum mechanism for students experiments at the M. Sc. Level. Some of the first year experience using this web-based setup in classes is referred.
In most of the courses given at the division of mechanical engineering systems at Linkoeping Institute of Technology we provide experimental setups to enhance the teaching Of M.Sc. students. Many of these experimental setups involve mechatronical systems. Disciplines like fluid power, electronics, and mechanics and also software technologies are used in each experiment. As our campus has recently been split into two different cities some new concepts for distance learning have been studied. The one described here tries to implement remotely controlled mechatronic setups for teaching basic programming of real-time operating systems and analysis of the dynamics of mechanical systems. The students control the regulators for the pendulum through a web interface and get measurement results and a movie back through their email.
The present setup uses a double linked pendulum that is controlled by a DC-motor and monitored through both camera and angular position sensors. All software needed is hosted on a double-processor PC running the RedHat 7.1. distribution complemented with real-time scheduling using DIAPM-RTAI 1.7. The Internet site is presented to the students using PHP, Apache and MySQL. All of the used software originates from the open source domain. The experience from integrating these technologies and security issues is discussed together with the web-camera interface. One of the important experiences from this project so far is the need for a good visual feedback. This is both in terms of video speed but also in resolution. It has been noticed that when the students makes misstates and wants to search the failure they want clear, large images with high resolution to support their personal believes in the cause of the failure. Even if the student does not need a high resolution image to get the idea of the mechanics and the function of the pendulum, they need such high quality images to get confidence in the hardware. It is important to support this when the ability to direct hand-on contact with the hardware is taken away. Some of the experiences in combining open source software; real-time scheduling and measurement hardware into a cost efficient way is also discussed. The pendulum has been available publicly on the Internet but has now been removed due to security issues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text, voice and video images are the most common forms of media content for instant communication on the Internet. Studies have shown that facial expressions convey much richer information than text and voice during a face-to-face conversation. The currently available real time means of communication (instant text messages, chat programs and videoconferencing), however, have major drawbacks in terms of exchanging facial expression. The first two means do not involve the image transmission, whilst video conferencing requires a large bandwidth that is not always available, and the transmitted image sequence is neither smooth nor without delay. The objective of the work presented here is to develop a technique that overcomes these limitations, by extracting the facial expression of speakers and to realise real-time communication. In order to get the facial expressions, the main characteristics of the image are emphasized. Interpolation is performed on edge points previously detected to create geometric shapes such as arcs, lines, etc. The regional dominant colours of the pictures are also extracted and the combined results are subsequently converted into Scalable Vector Graphics (SVG) format. The application based on the proposed technique aims at being used simultaneously with chat programs and being able to run on any platform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Biomedical Informatics Research Network is wide breadth project sponsored by the American National Institutes of Health (NIH) to promote the use of modern telecommunication for data exchange and collaboration in brain research.
The project is attempting to buid a database and network infrastructure in which neuroscientists will post, query, and analyze raw data, processed data, and the results of the analysis.
The project is divided into parts, which analyze mouse brain data and human brain data, respectively. In this phase of the project, the data are essentially anatomical, while in a future phase we foresee the introduction of functional data. One important source of raw data, both for the mouse and the human brains are magnetic resonance images (MRI), which provide dense volumetric information of the density of the brain or (in the case of functional MRI), of the brain activity. In the case of the brain mouse, these data are supplemented with images of slices of brains and other histological measure.
One important technical problem that we are facing in BIRN is that of managing these volumetric data, processing them (possibly using tools available only remotely), storing the results of the analyses, and making them available to all the institutions participating in the project.
This paper describes the problems posed by the BIRN project, the importance of image data in these activities, and the challenges they pose. We will describe the shared environment that we are creating, and the facilities for storing, querying, remotely processing, and sharing the image data that constitute the bulk of the brain data that scientists are producing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a framework for developing a Web-based distributed image processing application system that is flexible, convenient, and scalable. The system uses the existing Web-based technology and image processing methodologies to implement this capability in a distributed computing environment that may include powerful machines to process complex and large images. The system consists of browser, server, service registry and task scheduler, image data storage and management, and knowledge-based image processing services. A server-side application considers the user’s request from a client side. The server host identifies the request and the necessary resources and schedules the computing resources and image processing services. Based on the instructions of the developer’s side (the service provider) a proper knowledge-based on-line assistance is given to the client to select the right algorithm, set up proper parameter values in order to maximize the usage. Developers can modify and upgrade the services at their own site and publish the workable version, its interface, and required resources to the server. The server enables remote invocation of the algorithm by providing a seamless and efficient linkage mechanism. An application for segmentation operation using deformable contour methods for complex images is provided as an example.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Using both text and image content features, a hybrid image retrieval system for Word Wide Web is developed in this paper. We first use a text-based image meta-search engine to retrieve images from the Web based on the text information on the image host pages to provide an initial image set. Because of the high-speed and low cost nature of the text-based approach, we can easily retrieve a broad coverage of images with a high recall rate and a relatively low precision. An image content based ordering is then performed on the initial image set. All the images are clustered into different folders based on the image content features. In addition, the images can be re-ranked by the content features according to the user feedback. Such a design makes it truly practical to use both text and image content for image retrieval over the Internet. Experimental results confirm the efficiency of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The description of visual documents is a fundamental aspect of any efficient information management system, but the process of manually annotating large collections of documents is tedious and far from being perfect. The need for a generic and extensible annotation model therefore arises. In this paper, we present DEVA, an open, generic and expressive multimedia annotation framework. DEVA is an extension of the Dublin Core specification. The model can represent the semantic content of any visual document. It is described in the ontology language DAML+OIL and can easily be extended with external specialized ontologies, adapting the vocabulary to the given application domain.
In parallel, we present the Magritte annotation tool, which is an early prototype that validates the DEVA features. Magritte allows to manually annotating image collections. It is designed with a modular and extensible architecture, which enables the user to dynamically adapt the user interface to specialized ontologies merged into DEVA.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an automatic image categorization technique for content-based image filtering and retrieval system. In this paper, category-feature database for image categorization is constructed on human visual perception. Query images are automatically classified into predefined categories by content-based description using MPEG-7. Similarity distances at each category are measured using multiple MPEG-7 descriptors. In this paper, a matching technique for combining multiple similarity distances is proposed. The proposed method takes into account the categorization performance of single descriptor at each category. To evaluate the proposed method, it is applied to a great number of query images randomly collected from the Internet and other image databases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper motivates and develops an end-to-end methodology for representation and adaptation of arbitrary scalable content in a fully content non-specific manner. Scalable bit-streams are naturally organized in a symmetric multi-dimensional logical structure, and any adaptation is essentially a downward manipulation of this structure. Higher logical constructs are defined on top of this multi-tier structure to make the model more generally applicable to a variety of bit-streams involving rich media. The resultant composite model is referred to as the Structured Scalable Meta-format (SSM). Apart from the implicit bit-stream constraints that must be satisfied to make a scalable bit-stream SSM-compliant, two other elements that need to be formalized to build a complete adaptation and delivery infrastructure based on SSM are: a binary or XML description of the structure of the bit-stream resource and how it is to be manipulated to obtain various adapted versions; and a binary of XML specification of outbound constraints derived from capabilities and preferences of receiving terminals. By interpreting the descriptor and the constraint specifications, a universal adaptation engine can adapt the content appropriately to suit the specified needs and preferences of recipients, without knowledge of the specifics of the content, its encoding and/or encryption. With universal adaptation engines, different adaptation infrastructures are no longer needed for different types of scalable media.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present an overview of a new paradigm for tackling long standing computer vision problems. Specifically our approach is to build statistical models which translate from a visual representations (images) to semantic ones (associated text). As providing optimal text for training is difficult at best, we propose working with whatever associated text is available in large quantities. Examples include large image collections with keywords, museum image collections with descriptive text, news photos, and images on the web.
In this paper we discuss how the translation approach can give a handle on difficult questions such as: What counts as an object? Which objects are easy to recognize and which are hard? Which objects are indistinguishable using our features? How to integrate low level vision processes such as feature based segmentation, with high level processes such as grouping. We also summarize some of the models proposed for translating from visual information to text, and some of the methods used to evaluate their performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color is widely used for content-based image retrieval. In these applications the color properties of an image are characterized by the probability distribution of the colors in the image. These probability distributions are very often estimated by histograms although the histograms have many drawbacks compared to other estimators such as kernel density methods.
In this paper we investigate whether using kernel density estimators instead of histograms could give better descriptors of color images. Experiments using these descriptors to estimate the parameters of the underlying color distribution and in color based image retrieval (CBIR) applications were carried out in which the MPEG7 database of 5466 color images with 50 standard queries are used as the benchmark. Noisy images are also generated and put into the CBIR application to test the robustness of the descriptors against the noise. The results of our experiments show that good density estimators are not necessarily good descriptors for CBIR applications. We found that the histograms perform better than kernel based methods when used as descriptors for CBIR applications.
In the second part of the paper, optimal values of important parameters in the construction of these descriptors, particularly the smoothing parameters or the bandwidth of the estimators, are discussed. Our experiments show that using over-smoothed bandwidth gives better retrieval performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Internet world makes increasing use of XML-based technologies. In multimedia data indexing and retrieval, the MPEG-7 standard for Multimedia Description Scheme is specified using XML. The flexibility of XML allows users to define other markup semantics for special contexts, construct data-centric XML documents, exchange standardized data between computer systems, and present data in different applications. In this paper, the Inverted Image Indexing paradigm is presented and modeled using XML Schema.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the image retrieval problem for online image databases. Solutions to this problem may find applications in many areas including online information analysis, multimedia information retrieval and Web applications. In this research, we investigate this problem by proposing a novel solution, called CLEAR, that incorporates multi-features including color, texture, shape, as well as conventional geometric information. Moreover, CLEAR extracts the features based on regions in an image as opposed to the whole image domain, which allows the features to be more descriptive in indexing the objects of an image. To address the “inaccuracy” problem typically existing in many color-feature based retrieval systems, fuzzy logic is applied to the traditional color histogram to develop a fuzzy color histogram as the color feature vector. A similarity function is defined based on the multi-features through a balanced combination between global and regional similarity measures. In order to further improve the retrieval efficiency, a secondary clustering technique is developed and employed in CLEAR to significantly save query processing time without compromising retrieval precision. An implemented prototype system of CLEAR has demonstrated a promising retrieval performance for a test database containing 2000 general-purpose color images, as compared with its peer systems in the literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Relevance feedback has attracted the attention of many authors in image retrieval. However, in most work, only positive example has been considered. We think that negative example can be highly useful to better model the user's needs and specificities. In this paper, we introduce a new relevance feedback model that combines positive and negative examples for query processing and refinement. We start by explaining how negative example can help mitigating many problems in image retrieval such as similarity measures definition and feature selection. Then, we propose a new relevance feedback approach that uses positive example to perform generalization and negative example to perform specialization. When the query contains both positive and negative examples, it is processed in two steps. In the first step, only positive example is considered in order to reduce the heterogeneity of the set of images that participate in retrieval. Then, the second step considers the difference between positive and negative examples and acts on the images retained in the first step. Mathematically, the problem is formulated as simultaneously minimizing intra variance of positive and negative examples, and maximizing inter varicance. The proposed algorithm was implemented in our image retrieval system "Atlas" and tested on a collection of 10.000 images. We carried out some performance evaluation and the results were promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is obvious that image histograms are of very limit use in video analysis. For example, two images containing the same objects at different positions are mapped to the same histograms.
We show that a simple extension of image histograms to include the position information of the centroids of histograms bins leads to a useful representation for video analysis. This extension must be done carefully in order to obtain a representation that is stable with respect to noise. Moreover, the carefully extended histograms also add stability and reliability to the retrieval of still images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Today, interactive multimedia educational systems are well established, as they prove useful instruments to enhance one's learning capabilities. Hitherto, the main difficulty with almost all E-Learning systems was latent in the rich media implementation techniques. This meant that each and every system should be created individually as reapplying the media, be it only a part, or the whole content was not directly possible, as everything must be applied mechanically i.e. by hand. Consequently making E-learning systems exceedingly expensive to generate, both in time and money terms. Media-3D or M3D is a new platform independent programming language, developed at the Fraunhofer Institute Media Communication to enable visualisation and simulation of E-Learning multimedia content. M3D is an XML-based language, which is capable of distinguishing between the3D models from that of the 3D scenes, as well as handling provisions for animations, within the programme.
Here we give a technical account of M3D programming language and briefly describe two specific application scenarios where M3D is applied to create virtual reality E-Learning content for training of technical personnel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the increasing applications of the Web in e-commerce, advertising, and publication, new technologies are needed to improve Web graphics technology due to the current limitation of technology. The SVG (Scalable Vector Graphics) technology is a new revolutionary solution to overcome the existing problems in the current web technology. It provides precise and high-resolution web graphics using plain text format commands. It sets a new standard for web graphic format to allow us to present complicated graphics with rich test fonts and colors, high printing quality, and dynamic layout capabilities. This paper provides a tutorial overview about SVG technology and its essential features, capability, and advantages. The reports a comparison studies between SVG and other web graphics technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As processor speeds increase and the cost of digital video technology falls, the use of video is expanding in a plethora of applications including video surveillance, human computer interaction, tele-instruction, and enhanced sports broadcasts. However, a major problem that now faces developers of video systems is the requirement to build the low-level video processing from the ground up for each application. This paper describes a camera system that acts not merely as a provider of pixels, but as a video information server. A video application interacts with the camera server using the Camera Markup Language (CaML, pronounced camel) proposed here. CaML is an XML-based (Extensible Markup Language) data format for exchanging video information with a server. It provides a layer of abstraction between the application and the pixels to simplify the development process and is well-suited to exchanging data over a network. Using a camera as a server on a network makes it a simple matter for a single application to use multiple cameras. Local- and wide-area networks (LANs and WANs) replace the need for conventional methods for routing video signals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although it is relatively straightforward to implement a specialized graphical interface for interactive TV in a particular platform, creating a flexible and generic interface pose a challenge. The major issue is to create a general framework that would make the contents independent of the platforms. The focus of this paper is to design and implement an Interactive TV framework for graphical user interface (GUI) that can help content providers to create reusable and extensible presentations independent of platforms. Our GUI framework is based on SMIL that already has wide industry support and is an official W3C recommendation. It also builds customized options and generates user interactions dynamically on the client side based on a streamed SMIL file.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The future development of networked multimedia services is conditioned by the achievement of efficient methods to protect data owners against non-authorised copying and redistribution of the material put on the network, to grant that the Intellectual Property Rights (IRP) are well respected and the assets properly managed. A Notice and Takedown Procedure is considered, based on a self-regulatory regime, and as a possible implementation of this system, an Intelligent Agent based platform is proposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of XML video representation to handle the video synchronization problem in a layered video multicast system is investigated in this research. MPEG-4/XML was proposed in our previous work. In this work, we first propose an XML scheme to describe the MPEG-4 FGS content. The resulting MPEG-4/XML FGS format is specifically designed for a layered multicast streaming environment. The use of XML tags allows a user to access video contents with more flexibility at the cost of a small overhead in the transmitted file size. The raw XML file can be efficiently compressed to result in a small coded XML file. It is demonstrated that the difficult resynchronization problem in layered video multicast can be effectively dealt with the MPEG-4/XML format.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper addresses the problem of annotating photographs with broad semantic labels. To cope with the great variety of photos available on the WEB we have designed a hierarchical classification strategy which first classifies images as pornographic or not-pornographic. Not-pornographic images are then classified as indoor, outdoor, or close-up. On a database of over 9000 images, mostly downloaded from the web, our method achieves an average accuracy of close to 90%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Query by example is a common model developed for content-based image retrieval. The purpose of such a tool is to extract from a large database the most similar images to a request one. In practice, the meaningful characteristics of each image are first extracted. Then, each region is described with a vector composed with classical statistical features or spatial relationships. Finally, the system proposes to the user the images that minimize a certain similarity distance computed on each vector.
Nevertheless, query by example depends on a criterion determined by the user. Objectively, this last step of any content-based retrieval system then suffers from a large difficulty to express the real hope of the user. Thus, the results are always constrained to the similarity distance definition. In actual fact, it is not sufficient to compute good descriptors, a robust and adequate distance to compare them is also necessary.
Our purpose is more precisely to evaluate different similarity "blob-to-blob" distances. In fact, each image is first described locally using a coarse segmentation and the meaningful regions are extracted using a selection process based on color homogeneity. Among all these parameters, different distances are discussed using different approaches: spatial, shape, color and texture similarities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article, we present a new graphical navigation environment for image databases. Unlike "query by example" which focuses on similarities between images, the method we propose shows up visual differences occurring along paths offered to the user. Images are arranged to show an evolution along a direction, e.g. an axis that crosses the parameter space. This gives a global view of the database and allows the user to run all over the database in an organized way, and to focus on narrow areas by displaying sets of images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ubiquity of the Internet has brought about an increasing amount of multi-formatted Web documents. Although image occupies a large part of importance on these increasing Web documents, there have not been many researches for analyzing and understanding it. Many Web images are used for carrying important information but others are not used for it. If images in a Web document can be classified by which have particular information or not, then it would be very useful for analysis and multi-formatting of Web documents. In this paper we introduce the machine learning based methods of classifying Web images as either eliminable or non-eliminable. For this research, we have detected 16 special and rich features for Web images and experimented by using the Bayesian and decision tree methods. As the results, F-measures of 87.09%, 82.72% were achieved for each method and particularly, from the experiments to compare the effects of feature groups, it has proved that the selected features on this study are very useful for Web image classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a novel user-interface and distributed imaging system for controlling robotic web-cameras via a wireless cellular phone. A user scrolls an image canvas to select a new live picture. The cellular phone application (a Java MIDlet) sends a URL request, which encodes the new pan/tilt/optical-zoom of a live picture, to a web-camera server. The user downloads a new live picture centered on the user’s new viewpoint. The web-camera server mediates requests from users, by time-sharing control of the physical robotic hardware. By processing a queue of user requests at different pan/tilt/zoom locations, the server can capture a single photograph for each user. While one user downloads a new live image, the robotic camera moves to capture images and service other independent user requests. The end-to-end system enables each user to independently steer the robotic camera, viewing live snapshot pictures from a cellular phone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.