Versatile Video Coding (VVC) is the most recent and efficient video-compression standard of ITU-T and ISO/IEC. It follows the principle of a hybrid, block-based video codec and offers a high flexibility to select a coded representation of a video. While encoders can exploit this flexibility for compression efficiency, designing algorithms for fast encoding becomes a challenging problem. This problem has recently been attacked with data-driven methods that train suitable neural networks to steer the encoder decisions. On the other hand, an optimized and fast VVC software implementation is provided by Fraunhofer’s Versatile Video Encoder VVenC. The goal of this paper is to investigate whether these two approaches can be combined. To this end, we exemplarily incorporate a recent CNN-based approach that showed its efficiency for intra-picture coding in the VVC reference software VTM to VVenC. The CNN estimates parameters that restrict the multi-type tree (MTT) partitioning modes that are tested in rate-distortion optimization. To train the CNN, the approach considers the Lagrangian rate-distortion-time cost caused by the parameters. For performance evaluation, we compare the five operational points reachable with the VVenC presets to operational points that we reach by using the CNN jointly with the presets. Results show that the combination of both approaches is efficient and that there is room for further improvements.
The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Versatile Video Coding (VVC) standard. ISP divides a luma intra-predicted block along one dimension into 2 or 4 smaller blocks, called subpartitions, that are predicted using the same intra mode. This paper describes the design of this tool and its encoder search implementation in the VVC Test Model 7.3 (VTM-7.3) software. The main challenge of the ISP encoder search is the fact that the mode pre-selection based on the sum of absolute transformed differences typically utilized for intra prediction tools is not feasible in the ISP case, given that it would require knowing beforehand the values of the reconstructed samples of the subpartitions. For this reason, VTM employs a different strategy aimed to overcome this issue. The experimental tool-off tests carried out for the All Intra configuration show a gain of 0.52% for the 22-37 Quantization Parameter (QP) range with an associated encoder runtime of 85%. The results are improved to a 1.06% gain and an 87% encoder runtime in the case of the 32-47 QP range. Analogously, for the tool-on case the results for the 22-37 QP range are a 1.17% gain and a 134% encoder runtime and this improves in the 32-47 QP range to a 1.56% gain and a 126% encoder runtime.
This paper provides a technical overview of the most probable modes (MPM)-based multiple reference line (M-MRL) intra-picture prediction that was adopted into the Versatile Video Coding (VVC) standard draft at the 12th JVET meeting. M-MRL applies not only the nearest reference line but also farther reference lines to MPMs for intra-picture prediction. The highlighted aspects of the adopted M-MRL scheme include the signaling of the reference line index, discontinuous reference lines, the reference sample construction and prediction for farther reference lines, and the joint reference line and intra mode decisions at encoder side. Experimental results are provided to evaluate the performance of M-MRL on top of the VVC test model VTM-2.0.1 together with an analysis of discontinuous reference lines. The presented M-MRL provides 0.5% bitrate savings for an all intra and 0.2% for a random access configuration on average.
The development of the emerging Versatile Video Coding (VVC) standard was motivated by the need of significant bit-rate reductions for natural video content as well as content for different applications, such as computer generated screen content. The signal characteristics of screen content video are different to the ones of natural content. These include sharp edges as well as at areas of the same color. In block-based hybrid video coding designs, as employed in VVC and its predecessors standards, skipping the transform stage of the prediction residual for screen content signals can be beneficial due to the different residual signal characteristics. In this paper, a modified transform coefficient level coding tailored for transform skip residual signals is presented. This includes no signaling of the last significant position, a coded block ag for every subblock, modified context modeling and binarization as well as a limit for the number of context coded bins per sample. Experimental results show bit-rate savings up to 3.45% and 9.55% for two different classes of screen content test sequences coded in a random access configuration.
Today’s hybrid video coding systems typically perform an intra-picture prediction whereby blocks of samples are predicted from previously decoded samples of the same picture. For example, HEVC uses a set of angular prediction patterns to exploit directional sample correlations. In this paper, we propose new intra-picture prediction modes whose construction consists of two steps: First, a set of features is extracted from the decoded samples. Second, these features are used to select a predefined image pattern as the prediction signal. Since several intra prediction modes are proposed for each block-shape, a specific signalization scheme is also proposed. Our intra prediction modes lead to significant coding gains over state of the art video coding technologies.
The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
An approach to the neural measurement of perceived image quality using electroencephalography (EEG) is presented. 6 different images were tested on 6 different distortion levels. The distortions were introduced by a hybrid video encoder. The presented study consists of two parts: In a first part, subjects were asked to evaluate the quality of the test stimuli behaviorally during a conventional psychophysical test using a degradation category rating procedure. In a second part, subjects were presented undistorted and distorted texture images in a periodically alternating fashion at a fixed frequency. This alternating presentation elicits so called steady-state visual evoked potentials (SSVEP) as a brain response that can be measured on the scalp. The amplitude of modulations in the brain signals is significantly and strongly negatively correlated with the magnitude of visual impairment reported by the subjects. This neurophysiological approach to image quality assessment may potentially lead to a more objective evaluation, as behavioral approaches suffer from drawbacks such as biases, inter-subject variances and limitations to test duration.
KEYWORDS: Scalable video coding, Video, Copper, Video coding, Computer programming, Electronic filtering, Video surveillance, Spatial resolution, Quantization, Semantic video
This paper describes an extension of the upcoming High Efficiency Video Coding (HEVC) standard for supporting
spatial and quality scalable video coding. Besides scalable coding tools known from scalable profiles of prior video
coding standards such as H.262/MPEG-2 Video and H.264/MPEG-4 AVC, the proposed scalable HEVC extension
includes new coding tools that further improve the coding efficiency of the enhancement layer. In particular, new coding modes by which base and enhancement layer signals are combined for forming an improved enhancement layer prediction signal have been added. All scalable coding tools have been integrated in a way that the low-level syntax and decoding process of HEVC remain unchanged to a large extent. Simulation results for typical application scenarios demonstrate the effectiveness of the proposed design. For spatial and quality scalable coding with two layers, bit-rate savings of about 20-30% have been measured relative to simulcasting the layers, which corresponds to a bit-rate overhead of about 5-15% relative to single-layer coding of the enhancement layer.
Intra prediction is a fundamental tool in video coding with hybrid block-based architecture. Recent investigations have
shown that one of the most beneficial elements for a higher compression performance in high-resolution videos is the
incorporation of larger block structures. Thus in this work, we investigate the performance of novel intra prediction modes based on different image completion techniques in a new video coding scheme with large block structures. Image completion methods exploit the fact that high frequency image regions yield high coding costs when using classical H.264/AVC prediction modes. This problem is tackled by investigating the incorporation of several intra predictors using the concept of Laplace partial differential equation (PDE), Least Square (LS) based linear prediction and the Auto Regressive model. A major aspect of this article is the evaluation of the coding performance in a qualitative (i.e. coding efficiency) manner. Experimental results show significant improvements in compression (up to 7.41 %) by integrating the LS-based linear intra prediction.
The most recent video compression technology is High Efficiency Video Coding (HEVC). This soon to be completed standard is a joint development by Video Coding Experts Group (VCEG) of ITU-T and Moving Picture Experts Group (MPEG) of ISO/IEC. As one of its major technical novelties, HEVC supports variable prediction and transform block sizes using the quadtree approach for block partitioning. In terms of entropy coding, the Draft International Standard (DIS) of HEVC specifies context-based adaptive binary arithmetic coding (CABAC) as the single mode of operation. In this paper, a description of the specific CABAC-based entropy coding part in HEVC is given that is related to block structures and transform coefficient levels. In addition, experimental results are presented that indicate the benefit of the transform-coefficient level coding design in HEVC in terms of improved coding performance and reduced complexity.
With the prospective High Effciency Video Coding (HEVC) standard as jointly developed by ITU-T VCEG and ISO/IEC MPEG, a new step in video compression capability is achieved. Technically, HEVC is a hybrid video-coding approach using quadtree-based block partitioning together with motion-compensated prediction. Even though a high degree of adaptability is achieved by quadtree-based block partitioning, this approach is intrinsically tied to certain drawbacks which may result in redundant sets of motion parameters to be transmitted. In order to remove those redundancies, a block-merging algorithm for HEVC is proposed. This algorithm generates a single motion-parameter set for a whole region of contiguous motion-compensated blocks. Simulation results show that the proposed merging technique works more effciently than a conceptually similar direct mode.
Recent investigations have shown that one of the most beneficial elements for higher compression performance in highresolution
video is the incorporation of larger block structures. In this work, we will address the question of how to
incorporate perceptual aspects into new video coding schemes based on large block structures. This is rooted in the fact
that especially high frequency regions such as textures yield high coding costs when using classical prediction modes as
well as encoder control based on the mean squared error. To overcome this problem, we will investigate the
incorporation of novel intra predictors based on image completion methods. Furthermore, the integration of a perceptualbased
encoder control using the well-known structural similarity index will be analyzed. A major aspect of this article is
the evaluation of the coding results in a quantitative (i.e. statistical analysis of changes in mode decisions) as well as
qualitative (i.e. coding efficiency) manner.
KEYWORDS: Scalable video coding, Video, Computer programming, Visualization, Video coding, Spatial resolution, Quantization, LCDs, Video surveillance, Signal processing
This paper presents an overview of the new Scalable Video Coding (SVC) Amendment of H.264/AVC and the results of a performance evaluation for this new video coding specification. Whereas temporal scalability is already enabled by the existing H.264/AVC specification, the introduction of spatial and quality scalabiliy
requires new coding tools. Here, the layered structure of SVC and the main new coding tools are briefly described, and an overview of the newly defined SVC profiles and levels is provided. The second part of the paper describes a subjective evaluation that was carried out to test the efficiency of the SVC concept. The results of this evaluation show, that the coding tools introduced in the scalable extension of H.264/AVC provide a reasonable degree of spatial and quality scalability at very low costs in terms of additional bit rate. The evaluation consisted of a series of subjective quality tests and is backed up by objective PSNR measurements.
The results show, that SVC supports spatial and quality scalability with a bit rate overhead of less than or about 10%, and an indistinguishable visual quality compared to state of the art
single-layer coding.
We address the problem of rate allocation for video multicast over wireless mesh networks. An optimization
framework is established to incorporate the effects of heterogeneity in wireless link capacities, traffic contention
among neighboring links and different video distortion-rate (DR) characteristics. We present a distributed rate
allocation scheme with the goal of minimizing total video distortion of all peers without excessive network
utilization. The scheme relies on cross-layer information exchange between the MAC and application layers. It
adopts the scalable video coding (SVC) extensions of H.264/AVC for video rate adaptation, so that graceful
quality reduction can be achieved at intermediate nodes within each multicast tree. The performance of the
proposed scheme is compared with a heuristic scheme based on TCP-Friendly Rate Control (TFRC) for individual
peers. Network simulation results show that the proposed scheme tends to allocate higher rates for peers
experiencing higher link speeds, leading to higher overall video quality than the TFRC-based heuristic scheme.
KEYWORDS: Video, Video surveillance, Scalable video coding, Signal to noise ratio, Spatial resolution, Video coding, Computer programming, Signal processing, Quantization, Temporal resolution
The extension of H.264/AVC hybrid video coding towards scalable video coding (SVC) using motion-compensated temporal filtering (MCTF) is presented. Utilizing the lifting approach to implement MCTF, the motion compensation features of H.264/AVC can be re-used for the MCTF prediction step and extended in a straightforward way for the MCTF update step. The MCTF extension of H.264/AVC is also incorporated into a video codec that provides SNR, spatial, and (similar to hybrid video coding) temporal scalability. The paper provides a description of these techniques and presents experimental results that validate their efficiency. In addition applications of SVC to video transmission and video surveillance are described.
Considering inter-picture dependencies when selecting transform coefficient levels in hybrid video coding can be done via formulating the decoding process as a linear signal model and solving a quadratic program. The basic method assumes motion estimation and quantization parameters as being given and then selects the transform coefficient levels. However, when motion vectors are determined in advance, motion estimation must be conducted using uncoded reference pictures which is known to deliver inferior results compared to motion estimation on decoded reference pictures. In this work, we expand the basic method to incorporate the case where the motion estimation is considering decoded reference pictures. We propose an approach that iterates between transform coefficient selection and motion estimation. We find that a simple two-pass iteration works reasonably well. Our simulation
results using an H.264/AVC-conforming encoder show coding gains up to 1 dB in comparison to the quantization method specified in the test model of H.264/AVC.
We present a new approach to video coding that applies video analysis based on global motion features. A super-resolution mosaic is build for each frame to be encoded from a number of previously transmitted frames. This super-resolution mosaic is used to detect macroblocks that are only affected by global motion. For such macroblocks no prediction error is transmitted. They are purely reconstructed by prediction from the super-resolution mosaic which results in significant bitrate savings. Our results indicate total bitrate savings of 20% and more compared to a state of the art H.264/AVC codec at the same visual quality.
In this paper, we present a novel design of a wavelet-based video coding algorithm within a conventional hybrid framework of temporal motion-compensated prediction and transform coding. Our proposed algorithm involves the incorporation of multi-frame motion compensation as an effective means of improving the quality of the temporal prediction. In addition, we follow the rate-distortion optimizing strategy of using a Lagrangian cost function to discriminate between different decisions in the video encoding process. Finally, we demonstrate that context-based adaptive arithmetic coding is a key element for fast adaptation and high coding efficiency. The combination of overlapped block motion compensation and frame-based transform coding enables blocking-artifact free and hence subjectively more pleasing video. In comparison with a highly optimized MPEG-4 Advanced Simple Profile coder, our proposed scheme provides significant performance gains in objective quality of 2.0-3.5 dB PSNR.
Multi-hypothesis prediction extends motion compensation with one prediction signal to the linear superposition of several motion-compensated prediction signals. These motion- compensated prediction signals are referenced by motion vectors and picture reference parameters. This paper proposes a state-of-the-art video codec based on the ITU-T Recommendation H.263 that incorporates multi-hypothesis motion-compensated prediction. In contrast to B-Frames, reference pictures are always previously decoded pictures. It is demonstrated that two hypotheses are efficient for practical video compression algorithms. In addition, it is shown that multi-hypothesis motion-compensated prediction and variable block size prediction can be combined to improve the overall coding gain. The encoder utilizes rate- constrained coder control including rate-constrained multi- hypothesis motion estimation. The advanced 4-hypothesis codec improves coding efficiency up to 1.8 dB when compared to the advanced prediction codec with ten reference frames for the set of investigated test sequences.
We present a new video coding scheme that uses several references frames for improved motion-compensated prediction. The reference pictures are warped versions of the previously decoded frame applying polynomial motion compensation. In contrast to global motion compensation, where typically one motion model is transmitted, we show that in the general case more than one motion model is of benefit in terms of coding efficiency. In order to determine the multiple motion models we employ a robust clustering method based on the iterative application of the least median of squares estimator. The approach is incorporated into an H-263-based video codec and embedded into a rate- constrained motion estimation and macroblock mode decision frame work. It is demonstrated that adaptive multiple reference picture coding in general improves rate-distortion performance. PSNR gains of 1.2 dB in comparison to the H-263 codec for the high global and local motion sequence Stefan and 1 dB for the sequence Mobile and Calendar, which contains no global motion, are reported. These PSNR gains correspond to bit-rate savings of 21 percent and 30 percent comparing to the H-263 codec, respectively. The average number of motion models selected by the encoder for our test sequences is between 1 and 7 depending on the actual bit- rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.