PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 7543, including the Title Page, Copyright
information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a conventional hybrid video coding scheme, the choice of encoding parameters (motion vectors, quantization
parameters, etc.) is carried out by optimizing frame by frame the output distortion for a given rate budget.
While it is well known that motion estimation naturally induces a chain of dependencies among pixels, this is
usually not explicitly exploited in the coding process in order to improve overall coding efficiency. Specifically,
when considering a group of pictures with an IPPP... structure, each pixel of the first frame can be thought
of as the root of a tree whose children are the pixels of the subsequent frames predicted by it. In this work,
we demonstrate the advantages of such a representation by showing that, in some situations, the best motion
vector is not the one that minimizes the energy of the prediction residual, but the one that produces a better
tree structure, e.g., one that can be globally more favorable from a rate-distortion perspective. In this new
structure, pixel with a larger descendance are allocated extra rate to produce higher quality predictors. As a
proof of concept, we verify this assertion by assigning the quantization parameter in a video sequence in such a
way that pixels with a larger number of descendants are coded with a higher quality. In this way we are able
to improve RD performance by nearly 1 dB. Our preliminary results suggest that a deeper understanding of the
temporal dependencies can potentially lead to substantial gains in coding performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new compression algorithm based on multi-scale learned bases. We first explain the
construction of a set of image bases using a bintree segmentation and the optimization procedure used to select
the image basis from this set. We then present the sparse orthonormal transforms introduced by Sezer et al.1
and propose some extensions tending to improve the convergence of the learning algorithm on the one hand and
to adapt the transforms to the coding scheme used on the other hand. Comparisons in terms of rate-distortion
performance are finally made with the current compression standards JPEG and JPEG2000.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, 16-order and 32-order integer transform kernels are designed for the HD video coding in
H.264|MPEG-4 AVC and the performance analyses for large transforms are presented. An adaptive block size transform
coding scheme is also proposed based on the proposed transform kernels. Thus, additional 16-order (16 × 16, 16 × 8 and
8×16) and 32-order (32×32, 32×16 and 16×32) transforms are performed in addition to 8×8 and 4×4 transforms
which are exploited in the Fidelity Range Extension of H.264|MPEG-4 AVC. The experimental results show that the
variable block size transforms with the proposed higher order transform kernels yields 14.96% of bit saving in maximum
for HD video sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a massively parallel processor, the GPU is well-suited for performing 'per-pixel' operations in image processing
and computer vision. New developments in hardware, software, and algorithm mappings now allow entire vision
algorithms to be performed exclusively on GPU. In this paper we present the GPU mapping of a natural image
feature processing pipeline used in an image stitching application. We examine how to utilize hardware features
on the GPU for efficient processing to demonstrate how GPU programming now goes beyond per-pixel mappings,
and is providing speedups in image feature processing, and matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe the real-time CUDA implementation of an error concealment algorithm for high definition video at
720p. The concealment method is based on decoder motion search on high resolution frame, using a thumbnail
as a guide, and is therefore comparable in complexity as encoder motion search. We discuss the different
requirements for decoder motion search compared to encoder search, and present a fast motion search algorithm
suitable for parallel implementation in GPU. The design of the real-time CUDA implementation and its
performance analysis are also presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In most applications, video deinterlacing has to be performed in real time. Numerous algorithms have been
developed to strike a good balance between throughput and quality. The motion adaptive deinterlacing algorithm
switches between two modes: direct merging of two fields in areas of no motion, or intrafield adaptive interpolation
when motions are detected. In this paper, we propose a fast GPU-aided implementation of a motion adaptive
deinterlacing algorithm using NVIDIA CUDA (Compute Unified Device Architecture) technology. We discuss
the techniques of adapting the computations in motion detection and adaptive directional interpolation to the
GPU architecture for maximum video throughput possible. The objective is to fully utilize the processing power
of GPU without compromising the visual quality of the deinterlaced video. Experimental results are reported
and discussed to demonstrate the performance of the proposed GPU-aided motion adaptive video deinterlacer
in both speed and visual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
JPEG XR (formerly Microsoft Windows Media Photo and HD Photo) is the latest image coding standard. By integrating
various advanced technologies such as integer hierarchical lapped transform, context adaptive Huffman coding, and high
dynamic range coding, it achieves competitive performance to JPEG-2000, but with lower computational complexity
and memory requirement. In this paper, the GPU implementation of the JPEG XR codec using NVIDIA CUDA
(Compute Unified Device Architecture) technology is investigated. Design considerations to speed up the algorithm are
discussed, by taking full advantage of the properties of the CUDA framework and JPEG XR. Experimental results are
presented to demonstrate the performance of the GPU implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In 3D video system including depth information, once a depth video is coded by the state-of-the-art video compression
tools such as H.264/AVC, depth errors around the boundaries of objects can be intensified, and these can significantly
affect the quality of rendered virtual view later. Despite this drawback of depth video coding, its compression is essential
because of the enormous amount of input data in 3D video system. In this paper, we propose a line-based partitioned
intra prediction method which exploits geometric redundancy of depth video for an efficient compression without
significant errors around boundaries. The proposed algorithm can efficiently divide the current coded block into two
partitioned regions, and the algorithm independently predicts each region with previously coded neighbor pixel
information. Finally, the generated prediction mode adaptively alternates the conventional DC intra prediction mode. To
evaluate the intra prediction performances, we have implemented the proposed method into H.264/AVC intra prediction
scheme. Experimental results have demonstrated that our proposed method provides higher coding performance. The
coding performance for depth video compression itself was up to 3.71% bit-saving or 0.309dB on maximum peak signalto-
noise ratio (PSNR) gain among proper depth sequences which contain line-like boundaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
New data formats that include both video and the corresponding depth maps, such as multiview plus depth
(MVD), enable new video applications in which intermediate video views (virtual views) can be generated using
the transmitted/stored video views (reference views) and the corresponding depth maps as inputs. We propose a
depth map coding method based on a new distortion measurement by deriving relationships between distortions
in coded depth map and rendered view. In our experiments we use a codec based on H.264/AVC tools, where the
rate-distortion (RD) optimization for depth encoding makes use of the new distortion metric. Our experimental
results show the efficiency of the proposed method, with coding gains of up to 1.6 dB in interpolated frame
quality as compared to encoding the depth maps using the same coding tools but applying RD optimization
based on conventional distortion metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a Multiple Description Coding (MDC) method for reliable transmission of compressed
time consistent 3D dynamic meshes. It trades off reconstruction quality for error resilience to provide the best
expected reconstruction of 3D mesh sequence at the decoder side. The method is based on partitioning the mesh
frames into two sets by temporal subsampling and encoding each set independently by a 3D dynamic mesh coder.
The encoded independent bitstreams or so-called descriptions are transmitted independently. The 3D dynamic
mesh coder is based on predictive coding with spatial and temporal layered decomposition. In addition, the
proposed method allows for different redundancy allocations by including a number of encoded spatial layers of
the frames in the other set. The algorithm is evaluated with redundancy-rate-distortion curves and it is shown
that, when one of the descriptions is lost, acceptable quality can be achieved with around 50% redundancy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Wyner-Ziv video coding (WZVC) rate distortion performance is highly dependent on the quality of the side
information, an estimation of the original frame, created at the decoder. This paper, characterizes the WZVC efficiency
when motion compensated frame interpolation (MCFI) techniques are used to generate the side information, a difficult
problem in WZVC especially because the decoder only has available some reference decoded frames. The proposed
WZVC compression efficiency rate model relates the power spectral of the estimation error to the accuracy of the MCFI
motion field. Then, some interesting conclusions may be derived related to the impact of the motion field smoothness and
the correlation to the true motion trajectories on the compression performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel video compression scheme that exploits the idea of second-order-residual (SOR) coding is proposed for
high-bit-rate video applications in this work. We first study the limitation of today's high performance video
coding standard, H.264/AVC, and show that it is not effective in the coding of small image features and variations
for high-bit-rate video contents. For low to medium quality video streams, these small image features can be
removed by the quantization process. However, when the quantization stepsize becomes small in high-bit-rate
video, their existence degrades the rate-distortion coding performance significantly. To address this problem, we
propose a coding scheme that decomposes the residual signals into two layers: the first-order-residual (FOR) and
the second-order-residual (SOR). The FOR contains low frequency residuals while the SOR contains the high
frequency residuals. We adopt the H.264/AVC for the FOR coding and propose two schemes, called SOR-freq
and SOR-bp, for the SOR coding. It is shown by experimental results that the proposed FOR/SOR scheme
outperforms H.264/AVC by a significant margin (with about 20% bit rate saving) in high-bit-rate video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we propose the use of sparse signal representation techniques to solve the problem of closed-loop
spatial image prediction. The reconstruction of signal in the block to predict is based on basis functions selected
with the Matching Pursuit (MP)i terative algorithm, to best match a causal neighborhood. We evaluate this new
method in terms of PSNR and bitrate in a H.264 / AVC encoder. Experimental results indicate an improvement
of rate-distortion performance. In this paper, we also present results concerning the use of phase correlation to
improve the reconstruction trough shifted-basis functions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern video coding schemes such as H.264/AVC employ multi-hypothesis motion compensation for an improved
coding efficiency. However, an additional cost has to be paid for the improved prediction performance in these
schemes. Based on the observed high correlation among the multiple hypothesis in H.264/AVC, in this paper,
we propose a new method (Prediction Matching) to jointly combine explicit and implicit prediction approaches.
The first motion hypothesis on a predicted block is explicitly coded, while the eventual additional hypotheses are
implicitly derived at the decoder based on the first one and the available data from previously decoded frames.
Thus, the overhead to indicate motion information is reduced, while prediction accuracy may be better with
respect to fully implicit multi-hypothesis prediction. Proof-of-concept simulation results show that up to 7.06%
bitrate saving with respect to state-of-the-art H.264/AVC can be achieved using our Prediction Matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an approach to the task of automatic pose initialization of swimmers in videos. Thus, our goal is to
detect a swimmer inside a target video and assign an estimated position to her/his body parts. We first apply
a non-skin-color filter to reduce the search space inside each target frame. We then match previously devised
template sequences of Gaussian feature descriptors against sequences of feature vectors which are computed
within the remaining image regions. Finally, relative average joint positions from annotated images featuring the
key pose are assigned to the detection result and three-dimensional joint positions are estimated. We present
detection results for test videos of three different swim strokes and examine the performance of four types of
feature descriptors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a two-dimensional kinematic model for cyclic motions of humans, which is suitable for the use
as temporal prior in any Bayesian tracking framework. This human motion model is solely based on simple
kinematic properties: the joint accelerations. Distributions of joint accelerations subject to the cycle progress
are learned from training data. We present results obtained by applying the introduced model to the cyclic motion
of backstroke swimming in a Kalman filter framework that represents the posterior distribution by a Gaussian.
We experimentally evaluate the sensitivity of the motion model with respect to the frequency and noise level of
assumed appearance-based pose measurements by simulating various fidelities of the pose measurements using
ground truth data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The long term tracking of sparse local features in an image is important for many applications including camera
calibration for stereo applications, camera or global motion estimation and people surveillance. The majority of
existing tracking frameworks are based on some kind of prediction/correction idea e.g. KLT and Particle Filters.
However, given a careful selection of interest points throughout the sequence, the problem of tracking can be
solved with the Viterbi algorithm. This work introduces a novel approach to interest point selection for tracking
using the Mean Shift algorithm over short time windows. The resulting points are then articulated within a
Viterbi algorithm for creating very long term tracking data. The tracks are shown to be more accurate than
traditional KLT implementations and also do not suffer from accumulation of error with time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present new methods for object tracking initialization using automated moving object detection
based on background subtraction. The new methods are integrated into the real-time object tracking system
we previously proposed. Our proposed new background model updating method and adaptive thresholding are
used to produce a foreground object mask for object tracking initialization.
Traditional background subtraction method detects moving objects by subtracting the background model
from the current image. Compare to other common moving object detection algorithms, background subtraction
segments foreground objects more accurately and detects foreground objects even if they are motionless. However,
one drawback of traditional background subtraction is that it is susceptible to environmental changes, for
example, gradual or sudden illumination changes. The reason of this drawback is that it assumes a static background,
and hence a background model update is required for dynamic backgrounds. The major challenges then
are how to update the background model, and how to determine the threshold for classification of foreground and
background pixels. We proposed a method to determine the threshold automatically and dynamically depending
on the intensities of the pixels in the current frame and a method to update the background model with learning
rate depending on the differences of the pixels in the background model and the previous frame.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We argue that a key to further advances in the fields of image analysis and compression is a better understanding of texture.
We review a number of applications that critically depend on texture analysis, including image and video compression,
content-based retrieval, visual to tactile image conversion, and multimodal interfaces. We introduce the idea of "structurally
lossless" compression of visual data that allows significant differences between the original and decoded images, which
may be perceptible when they are viewed side-by-side, but do not affect the overall quality of the image. We then discuss
the development of objective texture similarity metrics, which allow substantial point-by-point deviations between textures
that according to human judgment are essentially identical.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
H.264/AVC standard offers an efficient way of reducing the noticeable artefacts of former video coding schemes,
but it can be perfectible for the coding of detailed texture areas. This paper presents a conceptual coding
framework, utilizing visual perception redundancy, which aims at improving both bit-rate and quality on textured
areas. The approach is generic and can be integrated into usual coding scheme. The proposed scheme is divided
into three steps: a first algorithm analyses texture regions, with an eye to build a dictionary of the most
representative texture sub-regions (RTS). The encoder preserves then them at a higher quality than the rest of
the picture, in order to enable a refinement algorithm to finally spread the preserved information over textured
areas. In this paper, we present a first solution to validate the framework, detailing then the encoder side in
order to define a simple method for dictionary building. The proposed H.264/AVC compliant scheme creates a
dictionary of macroblocks
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a smoothed reference inter-layer texture prediction mode for bit depth scalability based on the
Scalable Video Coding extension of the H.264/MPEG-4 AVC standard. In our approach, the base layer encodes
an 8-bit signal that can be decoded by any existing H.264/MPEG-4 AVC decoder and the enhancement layer
encodes a higher bit depth signal (e.g. 10/12-bit) which requires a bit depth scalable decoder. The approach
presented uses base layer motion vectors to conduct motion compensation upon enhancement layer reference
frames. Then, the motion compensated block is tone mapped and summed with the co-located base layer residue
block prior to being inverse tone mapped to obtain a smoothed reference predictor. In addition to the original
inter-/intra-layer prediction modes, the smoothed reference prediction mode enables inter-layer texture prediction
for blocks with inter-coded co-located block. The proposed method is designed to improve the coding efficiency
for sequences with non-linear tone mapping, in which case we have gains up to 0.4dB over the CGS-based BDS
framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
So far an efficient coding scheme for ultra high resolution (8K) video under a low bit-rate condition has not yet been
proposed. Within H.264 coding framework, highly efficient coding is realized due to the optimal control for the
macroblock (MB) coding mode decision. However, coding modes available in H.264 coding are not necessarily
appropriate for 8K full resolution coding under the considerably low bit-rate condition, and satisfactory coding
performance cannot be achieved within H.264. In this paper, we propose to define the extended coding mode from the
analytical result of R-D performance by conventional coding modes. From coding experiments, it was confirmed that the
maximum coding gain reached 0.18dB at the target bit-rate assumed in this study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the use of the non-local means (NLM) denoising approach in the context of image
deblurring and restoration. We propose a novel deblurring approach that utilizes a non-local regularization
constraint. Our interest in the NLM principle is its potential to suppress noise while effectively preserving edges
and texture detail. Our approach leads to an iterative cost function minimization algorithm, similar to common
deblurring methods, but incorporating update terms due to the non-local regularization constraint. The dataadaptive
noise suppression weights in the regularization term are updated and improved at each iteration, based
on the partially denoised and deblurred result. We compare our proposed algorithm to conventional deblurring
methods, including deblurring with total variation (TV) regularization. We also compare our algorithm to
combinations of the NLM-based filter followed by conventional deblurring methods. Our initial experimental
results demonstrate that the use of NLM-based filtering and regularization seems beneficial in the context of
image deblurring, reducing the risk of over-smoothing or suppression of texture detail, while suppressing noise.
Furthermore, the proposed deblurring algorithm with non-local regularization outperforms other methods, such
as deblurring with TV regularization or separate NLM-based denoising followed by deblurring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To correct geometric distortion and reduce blur in videos that suffer from atmospheric turbulence, a multi-frame
image reconstruction approach is proposed in this paper. This approach contains two major steps. In the
first step, a B-spline based non-rigid image registration algorithm is employed to register each observed frame
with respect to a reference image. To improve the registration accuracy, a symmetry constraint is introduced,
which penalizes inconsistency between the forward and backward deformation parameters during the estimation
process. A fast Gauss-Newton implementation method is also developed to reduce the computational cost of the
registration algorithm. In the second step, a high quality image is restored from the registered observed frames
under a Bayesian reconstruction framework, where we use L1 norm minimization and a bilateral total variation
(BTV) regularization prior, to make the algorithm more robust to noise and estimation error. Experiments show
that the proposed approach can effectively reduce the influence of atmospheric turbulence even for noisy videos
with relatively long exposure time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new motion compensated frame interpolation method is proposed based on the reliability of
motion vectors determined by the block residual energy. Additional motion re-estimation is applied to those
blocks where unreliable motion vectors are detected. The motion estimation algorithm employed in this work
combines block-based motion estimation with optical flow-based estimation, resulting in a more accurate representation
with only modest computational complexity. The experimental results show that it can improve the
visual quality of the interpolated frames where competing methods fail.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fast video transrating algorithms for DCT-based video coding standards have proven their efficiency in many
applications and are widely used in the industry. However, they cannot be re-used for H.264/AVC because they
introduce an unacceptable level of drift. To settle this issue, this paper proposes to adapt the H.264/AVC predictions by
separately processing the DC component from the other AC coefficients. This allows the drift to be removed from the requantization
transrating algorithms. Experimental results show the amount of bits in our prediction scheme is only
increased by 2.46 % for CIF and 1.87% for 720p in Intra in comparison with the H.264/AVC codec under the same
PSNR. The performance of the fast transrating algorithms applied on streams generated with our method are improved
dramatically, allowing to directly compete with the best in class, but computation load demanding Cascaded Pixel
Domain decode and recode Transcoding (CPDT) architecture. Additionally, one potential application induced by this
new prediction principle is the partial decoding of video streams to obtain reduced size images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a variant of the JPEG baseline image compression algorithm optimized for images that were generated
by a JPEG decompressor. It inverts the computational steps of one particular JPEG decompressor implementation
(Independent JPEG Group, IJG), and uses interval arithmetic and an iterative process to infer the possible
values of intermediate results during the decompression, which are not directly evident from the decompressor
output due to rounding. We applied our exact recompressor on a large database of images, each compressed at
ten different quality factors. At the default IJG quality factor 75, our implementation reconstructed the exact
quantized transform coefficients in 96% of the 64-pixel image blocks. For blocks where exact reconstruction
is not feasible, our implementation can output transform-coefficient intervals, each guaranteed to contain the
respective original value. Where different JPEG images decompress to the same result, we can output all possible
bit-streams. At quality factors 90 and above, exact recompression becomes infeasible due to combinatorial
explosion; but 68% of blocks still recompressed exactly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a strategy for seamless tessellation, of varying resolution tiles, based on smoothing
and mosaicking in the DWT domain. The scenario involves a tessellation with three different tile qualities
or levels of detail (LOD), at a given instant, depending on the viewpoint distance, the time of rendering and
hardware resources. The LOD is dependent on the multiresolution characteristic of wavelets from the now widely
accepted JPEG2000 codec. Taking the change in viewpoint focus, analogous to a window sliding approach, we
believe that at worst the window may come up to be composed of three different tile qualities with the resultant
artifacts at tile interfaces. To dilute these artifacts, we treat the tiles at the subband level, in the DWT
domain, by employing operations involving suitable subband-sized composite masks, conceived with smoothing
and mosaicking in perspective. The resultant composite subbands are subjected to a global inverse DWT to get
the final seamless tessellation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we show that it is possible to reduce the complexity of Intra MB coding in H.264/AVC based
on a novel chance constrained classifier. Using the pairs of simple mean-variances values, our technique is able
to reduce the complexity of Intra MB coding process with a negligible loss in PSNR. We present an alternate
approach to address the classification problem which is equivalent to machine learning. Implementation results
show that the proposed method reduces encoding time to about 20% of the reference implementation with
average loss of 0.05 dB in PSNR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Increasing transmission of medical data across multiple user systems raises concerns for medical image watermarking.
Additionaly, the use of volumetric images triggers the need for efficient compression techniques in
picture archiving and communication systems (PACS), or telemedicine applications. This paper describes an
hybrid data hiding/compression system, adapted to volumetric medical imaging. The central contribution is to
integrate blind watermarking, based on turbo trellis-coded quantization (TCQ), to JP3D encoder. Results of our
method applied to Magnetic Resonance (MR) and Computed Tomography (CT) medical images have shown that
our watermarking scheme is robust to JP3D compression attacks and can provide relative high data embedding
rate whereas keep a relative lower distortion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years it has been recognized that embedding information in wavelet transform domain leads to more
robust watermarks. In particular, several approaches have been proposed to address the problem of watermark
embedding combined to wavelet based image coding. In this paper, we present an alternative to quantization
based blind watermarking strategy in the framework of JPEG2000 still image compression. The central contribution
is the proposal of modified Quantization Index Modulation watermark design to reduce fidelity problem.
We also show that the proposed watermarking scheme exhibits a high robustness with respect to JPEG2000
compression and Gaussian noise attacks. After detailing the proposed solution, system performance on image
quality as well as robustness will be evaluated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low complexity encoders at the expense of high complexity decoders are advantageous in wireless video sensor
networks. Distributed video coding (DVC) achieves the above complexity balance, where the receivers compute
Side information (SI) by interpolating the key frames. Side information is modeled as a noisy version of input
video frame. In practise, correlation noise estimation at the receiver is a complex problem, and currently the
noise is estimated based on a residual variance between pixels of the key frames. Then the estimated (fixed)
variance is used to calculate the bit-metric values. In this paper, we have introduced the new variance estimation
technique that rely on the bit pattern of each pixel, and it is dynamically calculated over the entire motion
environment which helps to calculate the soft-value information required by the decoder. Our result shows that
the proposed bit based dynamic variance estimation significantly improves the peak signal to noise ratio (PSNR)
performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dirty Paper Trellis Codes (DPTC) watermarking, published in 2004, is a very efficient high rate scheme. Nevertheless,
it has two strong drawbacks: its security weakness and its CPU computation complexity. We propose
an embedding space at least as secure and a faster embedding. The embedding space is built on the projections
of some wavelet coefficients onto secret carriers. It keeps a good security level and has also good psycho-visual
properties. The embedding is based on a dichotomous rotation in the Cox, Miller and Boom Plane. It gives
better performances than previous fast embedding approaches. Four different attacks are performed and revealed
good robustness and rapidity performances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a low memory-cost message iteration architecture for a fast belief propagation(BP) algorithm.
To meet the real-time goal, our architecture basically follows multi-scale BP method and truncated linear smoothness
cost model. We observe that the message iteration process in BP requires a huge intermediate buffer to store four
directional messages of the whole node. Therefore, instead of updating all the node messages in each iteration sequence,
we propose that individual node could be completed iteration process in ahead and consecutively execute it node by
node. The key ideas in this paper focus on both maximizing architecture's parallelism and minimizing implementation
cost overhead. Therefore, we first apply a pipelined architecture to each iteration stage that is executed independently.
Note that pipelining makes it faster message throughput at a single iteration cycle rather than consuming whole iteration
cycle time as previously. We also make multiple message update nodes as a minimal processing unit to maximize the
parallelism. For the multi-scale BP method, the proposed parallel architecture does not cause additional execution time
for processing the nodes in the down-scaled Markov Random Field(MRF). Considering VGA image size, 4 iterations per
each scale and 64 disparity levels, our approach can reduce memory complexity by 99.7% and make it 340 times faster
than the general multi-scale BP architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.