In recent years, perceptually-driven super-resolution (SR) methods have been proposed to lower computational complexity. Furthermore, sparse representation based super-resolution is known to produce competitive high-resolution images with lower computational costs compared to other SR methods. Nevertheless, super-resolution is still difficult to be implemented with substantially low processing power for real-time applications. In order to speed up the processing time of SR, much effort has been made with efficient methods, which selectively incorporate elaborate computation algorithms for perceptually sensitive image regions based on a metric, such as just noticeable distortion (JND). Inspired by the previous works, we first propose a novel fast super-resolution method with sparse representation, which incorporates a no-reference just noticeable blur (JNB) metric. That is, the proposed fast super-resolution method efficiently generates super-resolution images by selectively applying a sparse representation method for perceptually sensitive image areas which are detected based on the JNB metric. Experimental results show that our JNB-based fast super-resolution method is about 4 times faster than a non-perceptual sparse representation based SR method for 256× 256 test LR images. Compared to a JND-based SR method, the proposed fast JNB-based SR method is about 3 times faster, with approximately 0.1 dB higher PSNR and a slightly higher SSIM value in average. This indicates that our proposed perceptual JNB-based SR method generates high-quality SR images with much lower computational costs, opening a new possibility for real-time hardware implementations.
In this paper, a novel distortion model based on a mixture of Laplacian distributions is presented for the transform
coefficients of predicted residues in quadtree coding. The mixture Laplacian distribution is made on the coding structure
with different quadtree coding unit (CU) depth. Moreover, for intra-coded CU, the distortion model is asymptotically
simplified based on the signal characteristics of the transform coefficient. The proposed mixture model of multiple
Laplacian distributions is tested for the High Efficiency Video Coding (HEVC) Test Model (HM) with quadtreestructured
Coding Unit (CU) and Transform Unit (TU). The experimental results show that the proposed model achieves
more accurate results of distortion estimation than the single probability models.
Video quality assessment is an important tool of guaranteeing video services in a required level of quality. Although
subjective quality assessment is more reliable due to the reflection of Human Visual System (HVS) than objective
quality assessment, it is a time-consuming and very expensive approach, and is not appropriate for real-time applications.
Therefore, much research has been made for objective video quality assessment instead of subjective video quality
assessment.
Among three kinds of objective assessment approaches which are
full-reference, reduced-reference and no-reference
methods, no-reference method has drawn much attention because it does not require any reference. The encoding
parameters are good features to use for no-reference model because the encoded bitstreams carry plenty of information
about the video contents and it is easy to extract some coding parameters to assess visual quality.
In this paper, we propose a no-reference quality metric using two kinds of coding parameters in H.264/AVC:
quantization and block mode parameters. These parameters are extracted and computed from H.264/AVC bitstreams,
without relying on pixel domain processing. We design a linear quality metric composed of these two parameters. The
weight values of the parameters are estimated using linear regression with the results of subjective quality assessment
which are obtained based on the DSIS (Double Stimulus Impairment Scale) method of ITU-R BT.500-11.
In this paper, 16-order and 32-order integer transform kernels are designed for the HD video coding in
H.264|MPEG-4 AVC and the performance analyses for large transforms are presented. An adaptive block size transform
coding scheme is also proposed based on the proposed transform kernels. Thus, additional 16-order (16 × 16, 16 × 8 and
8×16) and 32-order (32×32, 32×16 and 16×32) transforms are performed in addition to 8×8 and 4×4 transforms
which are exploited in the Fidelity Range Extension of H.264|MPEG-4 AVC. The experimental results show that the
variable block size transforms with the proposed higher order transform kernels yields 14.96% of bit saving in maximum
for HD video sequences.
In this paper, we show that we can apply probabilistic spatiotemporal macroblock filtering (PSMF) and partial decoding
processes to effectively detect and track multiple objects in real time in H.264|AVC bitstreams with stationary
background. Our contribution is that our method cannot only show fast processing time but also handle multiple moving
objects that are articulated, changing in size or internally have monotonous color, even though they contain a chaotic set
of non-homogeneous motion vectors inside. In addition, our partial decoding process for H.264|AVC bitstreams enables
to improve the accuracy of object trajectories and overcome long occlusion by using extracted color information.
KEYWORDS: Video, Video coding, Motion estimation, Computer programming, Visualization, Video compression, Mobile communications, Visual compression, Distortion, Video processing
We propose a fast macroblock mode decision scheme in the H.264|MPEG-4 Part 10 Advanced Video Coding
(AVC) for mobile video telephony applications. In general, the face region around a speaker is considered region of
interest (ROI) while the background is not importantly considered, thus being regarded as non-ROI for the video
telephony situations. We usually have to consider the following two issues: (1) the platforms of mobile video telephony
are usually computationally limited and the AVC codecs are computationally expensive; and (2) the allowed channel
bandwidths are usually very small so the compressed video streams are transmitted with visual quality degraded. In this
paper, we challenge the two issues: Firstly, a fast macroblock mode decision scheme is contrived to alleviate the
computational complexity of H.264|MPEG-4 Part 10 AVC; and secondly, ROI/non-ROI coding of H.264|MPEG-4 Part
10 AVC is incorporated to enhance subjective visual quality by encoding ROI data in higher quality but non-ROI data in
lower quality.
Our proposed fast macroblock mode decision scheme consists of three parts: early skip mode detection, fast inter
macroblock mode decision and intra prediction skip parts. The skip mode detection method is to decide whether or not to
perform the rest inter macroblock modes in P-Slices. The fast inter macroblock mode method is to reduce the candidate
block mode by SATDMOTION from 16x16 and 8x8 block motion estimation. The intra prediction skipping condition is set
to decide whether or not to perform the 4x4 intra prediction in P-Slices using the relation between magnitude of the
motion vectors of the current macroblock and the occurrence frequencies of intra predicted macroblocks. The
experimental results show that the proposed scheme yields up to 51.88% of the computational complexity reduction in
total encoding time with negligible amounts of PSNR drops and bit rate increments, respectively.
KEYWORDS: Computer programming, Scalable video coding, Video, Lithium, Quantization, Signal detection, Video coding, Electroluminescence, Communication engineering, Telecommunications
We introduce an efficient mode selection method in the enhancement layers of spatial scalability in the SVC encoder by selectively performing the inter-layer residual coding of the SVC. The proposed method is to make an analysis of the characteristics of integer transform coefficients for the subtracted signals for two residuals from lower and upper spatial layers. Then it selectively performs the inter-layer residual prediction coding in the spatial scalability if the SAD values of inter-layer residuals exceed adaptive threshold values. Therefore, by classifying the residuals according to
the properties of integer-transform coefficients only with the SAD of inter-layer residual signals between two layers, the SVC encoder can perform the inter-layer residual coding selectively, thus significantly reducing the total encoding time with 51.2% in average while maintaining the RD performance with negligible amounts of quality degradation.
In this paper, a fast intermode decision scheme which is suitable for the hierarchical B-picture structure in which much computational power is spent for combined variable block sizes and bi-predictive motion estimation is introduced. The hypothesis testing considering the characteristics of the hierarchical B-picture structure in the proposed method is performed on 16x16 and 8x8 blocks to have early termination for RD computation of all possible modes. The early termination in intermode decision is performed by comparing the pixel values of current blocks and corresponding motion-compensated blocks. When the hypothesis tests are performed, the confidence intervals to accept the null hypothesis or not are decided according to the temporal scalability levels under the consideration of properties of hierarchical B-pictures. The proposed scheme exhibits effective early termination behavior in intermode decision of temporal scalabilities and leads to a significant reduction up to 69% in computational complexity with slight increment in bit amounts. The degradation of visual quality turns out to be negligible in terms of PSNR values.
KEYWORDS: Multimedia, Standards development, Computer architecture, Intellectual property, Video, Communication engineering, Data archive systems, Laser based displays, Digital cameras, Visualization
The Musical Slide Show Multimedia Application Format (MAF) which is currently being standardized by the Moving
Picture Expert Group (MPEG) conveys the concept of combining several established standard technologies in a single
file format. It defines the format of packing up MP3 audio data, along with JPEG images, MPEG-7 Simple Profile
metadata, timed text, and MPEG-4 LASeR script. The presentation of Musical Slide Show MAF contents is made in a
synchronized manner with JPEG images, timed text to MP3 audio track. Also, the rendering effect on JPEG images can
be supported by the MPEG-4 LASeR script. This Musical Slide Show MAF will enrich the consumption of MP3
contents assisted with synchronized and rendered JPEG images, text as well as MPEG-7 metadata about the MP3 audio
contents. However, there is no protection and governance mechanism for Musical Slide Show MAF which is the
essential elements to deploy the sorts of contents. In this paper, to manage the Musical Slide Show MAF contents in a
controlled manner, we present a protection and governance mechanism by using MPEG-21 Intellectual Property
Management and Protection (IPMP) Components and MPEG-21 Rights Expression Language (REL) technologies We
implement an authoring tool and a player tool for Musical Slide Show MAF contents and show the experimental results
as well.
MPEG (Moving Picture Experts Groups) is currently standardizing Multimedia Application Format (MAF) which targets to provide simple but practical multimedia applications to the industry. One of the interesting and on-going working items of MAF activity is the so-called Music Player MAF which combines MPEG-1/2 layer III (MP3), JPEG image, and metadata into a standard format. In this paper, we propose a protection and governance mechanism to the Music Player MAF by incorporating other MPEG technology, MPEG-21 IPMP (Intellectual Property Management and Protection). We show, in this paper, use-case of the distribution and consumption of this Music Player contents, requirements, and how this protection and governance can be implemented in conjunction with the current Music Player MAF architecture and file system. With the use of MPEG-21 IPMP, the protection and governance to the content of Music Player MAF fulfils flexibility, extensibility, and granular in protection requirements.
KEYWORDS: Prototyping, Multimedia, Receivers, Data communications, Personal digital assistants, Telecommunications, Wireless communications, Cadmium, Video, Databases
Much research has been made to make it possible the ubiquitous video services over various kinds of user information terminals anytime and anywhere. In this paper, we design a prototype system for the seamless TV program content consumption based on user preference via various kinds of user information terminals in digital home environment, and we show an implementation and testing results with the prototype system. The prototype system operates with the TV Anytime metadata for the consumption of TV program contents based on user preferences in TV program genres, and use the MPEG-21 DIA (Digital Item Adaptation) tools which are the representation schema formats in order to describe the context information for user environments, user terminal characteristics, user characteristics for universal access and consumption of the preferred TV program contents. The proposed content mobility prototype system supports one or more users to seamlessly consume the same TV program contents via various kinds of user terminals. The proposed content mobility prototype system consists of a home server, display TV terminals, and user information terminals. We use 42 TV programs contents in eight different genres from four different TV channels in order to test our proposed prototype system.
This paper presents a new approach to translation and rotation invariant texture feature extraction for image texture retrieval. For the rotation invariant feature extraction, we invent angular projection along angular frequency in Polar coordinate system. The translation and rotation invariant feature vector for representing texture images is constructed by the averaged magnitude and the standard deviations of the magnitude of the Fourier transform spectrum obtained by the proposed angular projection. In order to easily implement the angular projection, the Radon transform is employed to obtain the Fourier transform spectrum of images in the Polar coordinate system. Then, angular projection is applied to extract the feature vector. We present our experimental results to show the robustness against the image rotation and the discriminatory capability for different texture images using MPEG-7 data set. Our Experiment result shows that the proposed rotation and translation invariant feature vector is effective in retrieval performance for the texture images with homogeneous, isotropic and local directionality.
The MPEG-4 BIFS and MPEG-4 LASeR are parts of MPEG-4 standard for describing multimedia scenes in binary
format. Scenes are multimedia presentations consisting of text, graphics, animation and real contents such as images,
videos, and audios. BIFS and LASeR scenes can be written in XML format and then be encoded into binary format for
the consumption of MPEG-4 terminals. While BIFS is a stable standard, LASeR is an emerging standard that is being
newly developed for the purpose of lightweight applications in constrained terminals such as mobile phones. This
explains why the LASeR specification is much simpler than that of BIFS. In this paper, we present a method of
transcoding for converting BIFS XML format into LASeR XML format. We analyze the differences between BIFS and
LASeR specifications and propose a set of mapping rules for the conversion purpose. The transcoding is done using an
XSLT processor for converting the BIFS XML formats into the corresponding LASeR XML formats. The text to text
conversion is incorporated into our transcoding system that also adapts real objects in the scenes such as images and
videos. The motivation of this paper is to enable a transcoding from BIFS to LASeR so that MPEG-4 content authors can
distribute their existing contents to mobile devices without the need of reauthoring and learning of a new standard.
We introduce a novel model capturing user preference using the Bayesian approach for recommending users' preferred multimedia content. Unlike other preference models, our method traces the trend of a user preference in time. It allows us to do online learning so we do not need exhaustive data collection. The tracing of the trend can be done by modifying the frequency of attributes in order to force the old preference to be correlated with the current preference under the assumption that the current preference is correlated with the near future preference. The modification is done by partitioning usage history data into smaller sets in a time axis and then weighting the frequencies of attributes to be computed from the partitioned sets of the usage history data in order to differently reflect their significance on predicting the future preference. In the experimental section, the learning and reasoning on user preference in genres are performed by the proposed method with a set of real TV viewers' watching history data collected from many real households. The reasoning performance by the proposed method is also compared with that by a typical method without training in order to show the superiority of our proposed method.
Telematics, a compound word with Telecommunications and Informatics, represents a kind of information service which provides traffic, public transport and emergency information to automobile drivers by using car navigation or other interactive communication system. In particular, as the DAB (Digital Audio Broadcasting) or DMB (Digital Multimedia Broadcasting) technology is introduced and commercialized, telematics is rapidly converging with various applications such as broadcasting and communication services.
In this paper, we suggest an idea how a telematics service can be realized by DMB application which enables multimedia service operate on mobile devices. In order to achieve this goal, we generate multimedia content including TPEG (Transport Protocol Experts Group) contents which contain information about road and traffic. TPEG is an expert group which aims at defining a byte-oriented protocol for transport information broadcast. Transport information includes Road Traffic Messages, Public Transport Information and Location information which enables safe and efficient driving for drivers. In Europe, TPEG contents were delivered over DAB network which supports audio only broadcasting. We investigate the technique to deliver the multimedia content with TPEG content over DMB network so that we can provide the information in the scope of telematics as well as multimedia contents.
In this paper, we introduce a new supervised learning method of a Bayesian network for user preference models. Unlike other preference models, our method traces the trend of a user preference as time passes. It allows us to do online learning so we do not need the exhaustive data collection. The tracing of the trend can be done by modifying the frequency of attributes in order to force the old preference to be correlated with the current preference under the assumption that the current preference is correlated with the near future preference. The objective of our learning method is to force the mutual information to be reinforced by modifying the frequency of the attributes in the old preference by providing weights to the attributes. With developing mathematical derivation of our learning method, experimental results on the learning and reasoning performance on TV genre preference using a real set of TV program watching history data.
Traditional transcoding on multimedia has been performed from the perspectives of user terminal capabilities such as display sizes and decoding processing power, and network resources such as available network bandwidth and quality of services (QoS) etc. The adaptation (or transcoding) of multimedia contents to given such constraints has been made by frame dropping and resizing of audiovisual, as well as reduction of SNR (Signal-to-Noise Ratio) values by saving the resulting bitrates. Not only such traditional transcoding is performed from the perspective of user’s environment, but also we incorporate a method of semantic transcoding of audiovisual based on region of interest (ROI) from user’s perspective. Users can designate their interested parts in images or video so that the corresponding video contents can be adapted focused on the user’s ROI. We incorporate the MPEG-21 DIA (Digital Item Adaptation) framework in which such semantic information of the user’s ROI is represented and delivered to the content provider side as XDI (context digital item). Representation schema of our semantic information of the user’s ROI has been adopted in MPEG-21 DIA Adaptation Model. In this paper, we present the usage of semantic information of user’s ROI for transcoding and show our system implementation with experimental results.
We present a texture descriptor for multimedia contents description in MPEG-7. The current MPEG-7 candidate for the texture descriptor has been designed to be suitable for the human visual system (HVS). In this paper, the texture is described using perceptual channels that are bands in spatial frequency. Further, the MPEG-7 texture description method has employed Radon Transformation that is suitable for HVS behavior. By taking average energy and energy deviation of the HVS channels, the texture descriptor is generated. To verify the performance of the texture descriptor, experiments with the MPEG-7 database are performed.
The Summary Description Scheme (DS) in MPEG-7 aims at providing a summary of an audio-visual program that offers the effective mechanism for efficient access to the program by abstracting the contents. In this paper, we present, in details, the Summary DS proposed to MPEG-7 that allows for efficient navigation and browsing to the contents of interest as well as overview of the overall content in an incorporated way. This efficiency is achieved by a unified description framework that combines static summary based on key frames and key sounds with dynamic summary based on a series of highlights. The proposed DS also allows efficient description for the event-based summarization by specifying summary criteria. In this paper, we also show the usefulness of the Summary DS in real applications largely based on the results of the Validation and Core Experiments we performed in MPEG-7 activities. We also describe a methodology for the automated generation of a dynamic summary.
The two-parameter constant false rate (CFAR) detector defines a local area where the shape and scale of the stencil are predetermined by physical considerations alone (target size) which may cause suboptimal performance of the detector in SAR imagery. In this paper, we propose a new CFAR stencil based on a family of gamma kernels which provide the ability of adapting the scale and shape of the stencil to offer the minimum false alarm, The new detector is called the gamma CFAR detector. The simulation results show that the gamma CFAR detector outperforms the tow- parameter CFAR detector in high-resolution, 1 ft. by 1 ft., fully polarimetric SAR imagery processed by the polarimetric whitening filter.
This work develops and tests a new target prescreening algorithm based on 2D gamma kernels. The key feature of the new kernel set is the existence of a free parameter that determines the size of its region of support. We show that the scale affects the false alarm rate of the two parameter CFAR test. We also show that a linear discriminant funtion composed from the linear and quadratic terms of the intensity in the test cell neighborhood improves the false alarm rate when compared with the two parameter CFAR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.