This paper presents cutting-edge design and application techniques for secondary transforms in the context of the next generation of video coding standardization beyond AV1 by the Alliance for Open Media (AOM). It thoroughly discusses methods for applying flexible secondary transform sets and kernels to both intra and inter coded blocks. The proposed methods enable the encoder to optimize transform sets for each intra prediction residual block and extend the use of secondary transforms for inter prediction residuals. Experimental results using the reference software show that the proposed approach enhances overall coding efficiency, measured in weighted YUV PSNR BD-Rates, by 3.40% for All Intra (AI), 1.70% for Random Access (RA), and 1.10% for Low Delay (LD) configurations according to AOM CTC (Common Test Condition) version 7.
KEYWORDS: Transform theory, Video coding, Covariance matrices, Online learning, Matrices, Signal processing, Eigenvectors, Image processing, Image compression, Education and training
Current video coding standards, including H.264/AVC, HEVC, and VVC, utilize discrete cosine transform (DCT), discrete sine transform (DST), to decorrelate the intra-prediction residuals. However, these transforms often face challenges in effectively decorrelating signals with complex, non-smooth, and non-periodic structures. Even in smooth areas, an abrupt transition (due to noise or prediction artifacts) can limit their effectiveness. This paper presents a novel block-adaptive separable path graph-based transform (GBT) that is particularly adept at handling such signals. This new method focuses on adaptively modifying the block size and learning GBT to enhance the performance. The GBT is learned in an online scenario using sequential K-means clustering, where each available block size has K clusters and K GBT kernels. This approach allows the GBT to be dynamically learned for the current block based on previously reconstructed areas with same block size and similar characteristics. Our evaluation, integrating this method with H.264/AVC intra-coding tools, shows significant improvement over the traditional H.264/AVC DCT in processing high-resolution natural images.
Smooth prediction modes and angular intra prediction modes are two major types of intra prediction modes in AV1 video codec designed to reduce spatial redundancy in video signals. The smooth prediction mode is particularly effective in predicting blocks with a smooth gradient within the block, while angular intra prediction mode predicts pixel values by a weighted average of neighboring pixels along different angular directions within the block. This paper proposes extensions to these modes to improve the intra coding performance towards a next-generation video codec beyond the AV1 codec. The first extension involves refining the smooth modes by considering the geometric distance between each sample and its reference pixels to achieve more precise prediction. The second extension involves refining the distribution of intra prediction angles in AV1 such that the intra prediction angles are denser around vertical and horizontal modes and coarser around diagonal directions. The third extension proposes applying Intra Bi-Prediction (IBP) to a subset of prediction angles, which implicitly allows the codec to choose between IBP on and off case. Experimental results show that the proposed methods achieve up to 0.3% luma and chroma average BD-rate savings with no encoding time increase for all intra configuration when compared to research-v4.0.0 tag of reference software AVM which is developed for exploring next-generation video coding beyond AV1. Notably, the highest coding gain observed was up to 1.5% for 4K video sequences.
KEYWORDS: Video coding, Video, High efficiency video coding, Design and modelling, Signal processing, Matrices, Video compression, Standards development, Quantization
Wedge mode is a crucial compound prediction mode for predicting a block that includes motion object boundaries in the AV1 video codec. The current AV1 design incorporates 16 block shape adaptive modes, utilizing a set of handcrafted one-dimensional blending masks, which are predefined and applied to the blending process of combining two predictors. However, this design lacks flexibility to cater to various types of motion object boundaries in real-world scenarios. This paper proposes an extended wedge mode to replace the original design. The proposed method features novel components such as relaxed block sizes and number of modes limitation, mathematically derived non-linear based two-dimensional masks for the blending process, and more efficient mode signaling methods. The experimental results of the proposed method evaluated with the AOM reference software under the AOM common test conditions demonstrate that it provides an averaged YUV Bjøntegaard Delta rate reduction of 0.3 % for random access and 0.7 % for low delay configurations without a major increase in complexity.
With the popularity of video sharing applications and video conferencing systems, there has been a growth of interest to measure and enhance the quality of videos captured and transmitted by those applications. While assessing the quality of UGC videos itself is still an open question, it is even challenging to enhance the perceptual quality of UGC videos with unknown characteristics. In this work, we study the potential to enhance the quality of UGC videos by increasing the sharpen effects. To this end, we construct a subjective dataset by conducting a massive online crowdsourcing. The dataset consists of 1200 sharpness enhanced UGC videos processed from 200 UGC source videos. During subjective test, each processed video is compared with its source to capture finegrained quality difference. We propose a statistical model to precisely measure whether the quality is enhanced or degraded. Moreover, we benchmark state-of-the-art No-Reference image or video quality metrics with the collected subjective data. It is observed that most metrics do not correlate well with subjective score. This indicates the need to develop more reliable objective metrics for UGC videos.
Video codec is the core engine which has been driving various video related applications for the past several decades, such as online video streaming and video conferencing. Driven by the drastic growth of global video traffic and improved computation power of various devices, the evolution of video compression technology has delivered several generations of video coding standards. In this paper, the technologies used in recent video codecs are described and their coding performance are compared, including HEVC and VVC developed by the ISO/IEC MPEG and ITU-T VCEG, VP9 developed by Google, AV1 developed by AOM and AVS3 (IEEE 1857) developed by Audio Video coding Standard workgroup of China. The datasets cover a wide range of typically used video resolutions up to 4K. The experiments are performed by collecting four data points for each test sequence, and the coding efficiency is measured by the popular BD-Rate metrics.
This paper provides a technical overview of the most probable modes (MPM)-based multiple reference line (M-MRL) intra-picture prediction that was adopted into the Versatile Video Coding (VVC) standard draft at the 12th JVET meeting. M-MRL applies not only the nearest reference line but also farther reference lines to MPMs for intra-picture prediction. The highlighted aspects of the adopted M-MRL scheme include the signaling of the reference line index, discontinuous reference lines, the reference sample construction and prediction for farther reference lines, and the joint reference line and intra mode decisions at encoder side. Experimental results are provided to evaluate the performance of M-MRL on top of the VVC test model VTM-2.0.1 together with an analysis of discontinuous reference lines. The presented M-MRL provides 0.5% bitrate savings for an all intra and 0.2% for a random access configuration on average.
KEYWORDS: Video, High dynamic range imaging, Scalable video coding, Video coding, RGB color model, Quantization, Osmium, Video surveillance, Computer programming, Image compression
This paper presents a technique for coding high dynamic range videos. The proposed coding scheme is scalable, such that both standard dynamic range and high dynamic range representations of a video can be extracted from one bit stream. A localized inverse tone mapping method is proposed for efficient inter-layer prediction, which applies a scaling factor and an offset to each macroblock, per color channel. The scaling factors and offsets are predicted from neighboring macroblocks, and then the differences are entropy coded. The proposed inter-layer prediction technique is independent of the forward tone mapping method and is able to cover a wide range of bit-depths and various color spaces. Simulations are performed based on H.264/AVC SVC common software and core experiment conditions. Results show the effectiveness of the proposed method.
An efficient MPEG-2 to MPEG-4 video transcoder is presented in
this paper. We consider the transcoding from high quality and bit
rate MPEG-2 video with larger image size (e.g. 4CIF/4SIF, CIF) to
lower quality and bit rate MPEG-4 video with smaller image size
(e.g. CIF, QCIF). First, the transcoder needs to down-sample the
input MPEG-2 video. Since the motion vectors carried by the MPEG-2
stream will be reused in the transcoding process, they are
sub-sampled besides the frame pixels, and the coding mode for each
down-sampled macroblock is examined. A new rate control method is
proposed to convert the high bit rate MPEG-2 video to the low bit
rate MPEG-4 counterpart. The proposed rate control scheme adjusts
the frame rate and the frame quantization step size simultaneously
according to the channel bandwidth to achieve a good
temporal-spatial quality tradeoff. Due to the reuse of motion
vectors, key frames (i.e. I and P frames) cannot be skipped to
maintain the prediction sequential order, while some B frames
containing less temporal information may be skipped in transcoding
to save the bit rate. Skipped B frames can be reconstructed at the
decoder to ensure the full frame rate playback. The TM7 quadratic
Rate-Qtz model is adopted in the proposed rate control scheme to
calculate the re-quantization step size from the given target bit.
Simulations show that the proposed MPEG-2 to MPEG-4 video
transcoder with rate control can out-perform the basic MPEG
transcoder that adjusts the re-quantization step size at a
constant frame rate. The complexity of the proposed transcoder is
low so that it can be used in real-time applications.
Novel temporal-spatial rate control solutions for MPEG video transcoding are investigated in this paper. We present two rate control approaches for MPEG video transcoding, one for IPP (including only I and P frames, no B frames) streams and the other for IBP (including I, P and B frames) streams. The proposed rate control schemes adapt the frame rate and the quantization step size simultaneously according to the available channel bandwidth to achieve a good temporal-spatial quality tradeoff. In our proposed solutions, key frames are not allowed for skipping in order to maintain the prediction sequential order, and the MPEG-4 quadratic Rate-Qtz model is adopted to calculate the quantization step size from a given target bit rate. Simulations show that the proposed rate control method for IPP stream transcoding achieves a higher average PSNR value than the MPEG-4 rate control (frame level) with a constant frame rate in CBR channels. Furthermore, the proposed rate control method for IBP stream transcoding can significantly enhance the transcoded video quality in VBR channels, in comparison with straightforward MPEG video transcoding by adjusting the quantization step size at a constant frame rate. The two proposed transcoders are in low computational complexity so that they can be used in real-time transcoding applications.
The joint temporal-spatial rate control problem was examined in to achieve proper temporal-spatial quality tradeoff. Theoretical analysis and the performance benchmark of several optimal/suboptimal algorithms were presented. However, the high computational complexity of these solutions prevents them from being used in practical applications. To address this problem, good rate-distortion (R-D) models used to approximate empirical R-D curves are developed in this work. The complexity caused by actual encoding is reduced so that the process of finding the optimal/suboptimal solutions is significantly expedited. Different approximating R-D models are proposed for INTRA, INTER and skipped frames, respectively, by considering both coding and interpolation dependencies among successive frames. Simulation results show that the approximation error of the proposed R-D models are very small, which results in small degradation of coded video quality.
A joint temporal-spatial rate control scheme is investigated to optimize temporal and spatial rate-distortion (R-D) performances for full frame-rate video playback, where skipped frames are reconstructed via frame interpolation. The derived optimization problem is too complex to solve by using a straightforward Lagrangian multiplier method due to inter-frame coding dependency (e.g. I/P/B) and the frame-dependent reconstruction. To reduce the complexity, frames are adaptively grouped based on MAD (mean absolute difference) differentials, and iterative greedy search based on the R-D cost is applied to each group of frames to obtain a suboptimal solution. The performance of full frame-rate playback of the ITU-T H.263+/TMN8 codec is evaluated by reconstructing skipped frames with bidirectional motion-compensated interpolation. Experimental results show that the proposed solution has a gain of 0.2 - 1.0 dB over the traditional scheme with a fixed frame skip in average PSNR. Furthermore, the frame-by-frame PSNR variance is also reduced.
A new technique based on motion compensated prediction is investigated
in this research to enhance the quality of interpolated frames in
frame-rate up-conversion applications. To playback temporally subsampled
video at the full frame-rate, motion compensated interpolation (MCI) is
usually employed to reconstruct skipped frames by referencing the
decoded frames and motion vectors between them. Since the conventional
motion estimation scheme (as adopted in H.26X and MPEG-X) adopts the
block matching algorithm (BMA) only between reference and predicted
frames, the derived motion vectors for skipped frames are limited in
supporting MCI at the decoder end. Thus, by embedding the MCI module in
the encoder loop, the performance gain of an embedded encoder improves
the frame interpolation accuracy of the decoder. In particular, a
two-step approach is considered in this work. First, by incorporating
compensation efficiency of both predicted and interpolated frames, a
refining scheme for reference motion vectors is proposed. Next,
efficient weighting of reference motion vectors, which is similar to
window weighting of overlapped block motion compensation (OBMC), is
adopted to enhance deformable block-based MCI with four vertex motion
vectors. Experimental results are provided to illustrate the performance
gain of the proposed frame interpolation scheme.
By augmenting the ITU-T H.263 standard bit stream with supplementary motion vectors for to-be-interpolated frames, a new deformable block-based fast motion-compensated frame interpolation (DB-FMCI) scheme is presented. Unlike other motion-compensated interpolation methods, which assume a constant motion velocity between two reference P frames, the proposed scheme takes into account the non-linearity of motion to achieve a better interpolation result. The supplementary motion information for the so-called M frame (motion frame) is defined, which consists of compressed residues of linear and non-linear motion vectors. The non-linear motion vectors of skipped frames are used at the decoder to determine the 6- parameter affine-based DB-FMCI. Experimental results show that the proposed non-linear enhancement scheme can achieve a higher PSNR value and better visual quality in comparison with traditional methods based only on the linear motion assumption.
A new motion-compensated frame interpolation scheme for low bitrate video based on the ITU-T H.263/H.263+ standard is investigated in this research. The proposed scheme works solely on the decoded bitstream with a block-based approach to achieve interpolation results. It is composed of two main modules: the background/foreground segmentation module and the hybrid motion compensated frame interpolation module. The background/foreground segmentation module uses a global motion model to estimate the background motion and an iterative background update to refine the segmentation. The hybrid motion compensated frame interpolation module is employed to reconstruct background and foreground, respectively. Global motion compensation and frame interpolation is applied to background blocks where either the 6-parameter affine or the 8-parameter perspective model is used to reduce the computational complexity and implement perspective correction, while local motion compensation and frame interpolation with localized triangular patch mapping is applied to the foreground area. Experiments show that the proposed scheme can achieve higher overall visual quality compared to conventional block-based frame interpolation schemes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.