A quality-based bit rate ladder design for over-the-top video streaming services is presented. Following the design criterion of maximizing subjective quality under the constraint of minimizing storage costs, the bit rate ladder is defined by three parameters. The first parameter is the lowest VMAF score at which a video signal is on average subjectively indistinguishable from the original video signal. Following the international recommendation ITU-R BT.500, extensive subjective tests were carried out to evaluate the fundamental relationships between the subjective quality and the VMAF score using a 4K OLED TV environment. Based on the test results, this VMAF score is set to 95. The second parameter is the lowest VMAF score being accepted on average by more than 50 % of the users for watching video signals of free streaming services. Additional tests yield in setting this VMAF score to 55. The third parameter is the maximum difference of two VMAF scores, for which the associated subjective qualities are approximately the same on average. In a third test, this difference is determined to be 2. This results in an ideal bit rate ladder providing each video signal in 21 qualities associated to the VMAF scores 95, 93, …, 57, 55. This bit rate ladder design can be applied to complete video signals occurring in per-title encoding strategies or to individual scenes of video signals occurring in per scene or shot-based encoding strategies. Applications using less than 21 renditions for this range, may suffer from impaired subjective quality
KEYWORDS: Computer programming, Spatial resolution, Wavelets, Motion estimation, Video compression, 3D video compression, Scalable video coding, Quantization, Video coding, Signal to noise ratio
As of its open-loop structure and good decorrelation capability, motion-compensated temporal filtering (MCTF) provides a robust basis for highly-efficient scalable video coding. Combining MCTF with spatial wavelet decomposition and embedded quantization results in a 3D wavelet video compression system, providing temporal, spatial, and SNR scalability. Recent results indicate that the overall coding performance of these systems can be maximized if temporal filtering is performed in spatial domain (t+2D approach). However, as compared to non-scalable video coding, the performance of t+2D systems may not be satisfactory if spatial scalability needs to be provided. One important reason for this fact is the problem of spatial scalability of motion information. In this paper we present a conceptually new approach for t+2D-based video compression with spatially scalable
motion information. We call our approach overcomplete MCTF since multiple spatial-domain temporal filtering operations are needed to generate the lower spatial scales of the temporal subbands. Specifically, the encoder performs MCTF-based generation of reference sequences for the coarser spatial scales. We find that the newly generated reference sequences are of satisfactory quality. Compared to the conventional t+2D system, our approach allows for optimization of the reconstruction quality at lower spatial scales while having reduced impact on the reconstruction quality at high spatial scales/bitrates.
Contrary to predictive schemes such as hybrid video coding systems,
orthonormal transform coding systems are immune to error accumulation
in case of desynchronization between encoder and decoder. Therefore,
these systems allow for drift-free data adaptation at bit stream
level, thus, scalability. In t+2D interframe wavelet video coding,
wavelet-based motion-compensated temporal filtering is employed,
followed by spatial wavelet decomposition and bit plane coding. This
allows for temporal, spatial, and SNR scalability. While motion
compensation seems to be essential in this scheme to achieve excellent
coding performance, it causes local violation of the orthonormality of
the temporal transform. Particularly, motion compensated interframe
wavelet systems employ predictive coding for certain occlusion
areas. In case of reference mismatch between encoder and decoder,
error accumulation occurs in these regions. In this paper we present
an approach to adapt the encoder operating point for predictively
coded regions, effectively eliminating the reference mismatch
adaptively. An iterative algorithm for computation of the decoder
reference at the encoder side is presented for t+2D systems. We show
that this approach significantly increases overall coding performance,
gaining up to 1 dB in PSNR. Furthermore, the optimized quantization
algorithm presented in an earlier work can be applied more
effectively, leading to more even noise distribution.
In interframe wavelet video coding, wavelet-based motion-compensated temporal filtering (MCTF) is combined with spatial wavelet decomposition, allowing for efficient spatio-temporal decorrelation and temporal, spatial and SNR scalability. Contemporary interframe wavelet video coding concepts employ block-based motion estimation (ME) and compensation (MC) to exploit temporal redundancy between successive frames. Due to occlusion effects and imperfect motion modeling, block-based MCTF may generate temporal high frequency subbands with block-wise varying coefficient statistics, and low frequency subbands with block edges. Both effects may cause declined spatial transform gain and blocking artifacts. As a modification to MCTF, we present spatial highpass transition filtering (SHTF) and spatial lowpass transition filtering (SLTF), introducing smooth transitions between motion blocks in the high and low frequency subbands, respectively. Additionally, we analyze the propagation of quantization noise in MCTF and present an optimized quantization strategy to compensate for variations in synthesis filtering for different block types. Combining these approaches leads to a reduction of blocking artifacts, smoothed temporal PSNR performance, and significantly improved coding efficiency.
KEYWORDS: Video coding, Wavelets, Linear filtering, Motion estimation, Video, 3D modeling, Motion models, Quantization, Communication engineering, Spatial filters
For exploitation of temporal interdependencies between consecutive frames, in existing 3D wavelet video coding concepts a blockwise motion estimation (ME) and compensation (MC) is employed. Because of local object motion, rotation or scaling, the processing of occlusion areas is problematic. In these regions, the calculation of correct motion vectors (MV) is not always possible and blocking artifacts may appear at the motion boundaries to the connected areas, for which uniquely referenced MV could be estimated. In order to avoid this, smooth transitions can be included around the occlusion pixels, which means to blur out the block artifacts. The proposed algorithm is based on the MC-EZBC 3D wavelet video coder (Motion-Compensated Embedded video coding algorithm using ZeroBlocks of subband / wavelet coefficients and Context modeling), which employs a lifting approach for temporal filtering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.