Exponential increase in the demand for high-quality user-generated content (UGC) videos and limited bandwidth pose great challenges for hosting platforms in practice. How to optimize the compression of UGC videos efficiently becomes critical. As the ultimate receiver is human visual system, there is a growing consensus that the optimization of the video coding and processing shall be fully driven by the perceptual quality, so traditional rate control-based methods may not be optimal. In this paper, a novel perceptual model on compressed UGC video quality is proposed by exploiting characteristics extracted from only source video. In the proposed method, content-aware features and quality-aware features are explored to estimate quality curves against quantization parameter (QP) variations. Specifically, content revelant deep semantic features from pre-trained image classification neural networks and quality revelant handcrafted features from various objective video quality assessment (VQA) models are utilized. Finally, a machine-learning approach is proposed to predict the quality of compressed videos of different QP values. Hence, the quality curves can be driven, by estimating the QP for given target quality, a quality-centered compression paradigm can be built. Based on experimental results, the proposed method can accurately model quality curves for various UGC videos and control compression quality well.
KEYWORDS: Motion estimation, Video coding, Computer programming, Internet, Distortion, Visual information processing, Electronic imaging, Current controlled current source, Image processing, Basic research
In conventional motion compensation, prediction block is related only with one motion vector for P frame. Multihypothesis motion compensation (MHMC) is proposed to improve the prediction performance of conventional motion compensation. However, multiple motion vectors have to be searched and coded for MHMC. In this paper, we propose a new low-cost multi-hypothesis motion compensation (LMHMC) scheme. In LMHMC, a block can be predicted from multiple-hypothesis with only one motion vector to be searched and coded into bit-stream, other motion vectors are predicted from motion vectors of neighboring blocks, and so both the encoding complexity and bit-rate of MHMC can be saved by our proposed LMHMC. By adding LMHMC as an additional mode in MPEG internet video coding (IVC) platform, the B-D rate saving is up to 10%, and the average B-D rate saving is close to 5% in Low Delay configure. We also compare the performance between MHMC and LMHMC in IVC platform, the performance of MHMC is improved about 2% on average by LMHMC.
In this paper, an auto-regressive (AR) model is proposed to generate the side information for low-delay distributed video
coding (DVC). The side information generation of current Wyner-Ziv (WZ) frame t consists of two forward AR
interpolations. First, each pixel within the rebuilt frame t−1 is approximated as a linear combination of pixels within a
spatial neighborhood along the motion trajectory within the rebuilt frame t−2. Applying the least mean square
algorithm, the coefficient of the first forward AR model is derived. Secondly, the pixels within the rebuilt frame t−2
are approximated by the corresponding pixels within rebuilt frame t−1. And then the geometric symmetric property of
the AR model is exploited to derive the coefficient of the second forward AR model. Finally, the side information is
generated as the average of the interpolations obtained by the two forward AR interpolations. The experimental results
have demonstrated that the proposed AR model can significantly improve the PSNR of the side information compared to
existing motion extrapolation based approaches.
Stereoscopic video is a practical and important manner for 3-D video applications, and robust stereoscopic video
transmission over error-prone networks has posed a technical challenge for stereoscopic video coding. In this paper, we
present a rate-distortion optimization algorithm with inter-view refreshment for error-resilient stereoscopic video coding.
First, inter-view refreshment is proposed to suppress error propagations besides intra refreshment. Then, we propose an
end-to-end distortion model for stereoscopic video coding which concurrently considers network conditions, inter-view
refreshment, and error concealment tools. Finally, based on the proposed end-to-end distortion model, a rate-distortion
optimization algorithm is presented to adaptively select inter-view, inter and intra coding modes for error-resilient
stereoscopic video coding. Simulation results show that the proposed scheme has a superior transmission efficiency
improvement for stereoscopic video coding.
In this paper, a 3D auto-regressive (AR) model is proposed for bi-directional prediction. The prediction is composed of
two 3D AR models, which are along the forward and backward directions, respectively. Applying the 3D AR model,
each pixel in the current frame is predicted as a weighted summation of pixels within a spatial neighborhood along the
forward/backward motions within the forward/backward reference frames. Ultimately, the prediction of each pixel is
obtained as the combination of predictions generated by the two 3D AR models. To derive accurate AR coefficients, this
paper proposes a framework that performs simultaneous coefficient estimation and image interpolation. As opposed to
other methods, the predicted pixels generated by one 3D AR model are further used to predict the pixels in adjacent
frame along the motion trajectory. Consequently, each pixel in one forward/backward reference frame can be predicted
as a nonlinear combination of pixels within an enlarged spatial neighborhood along the motion in one backward/forward
reference frame. An iterative algorithm using a nonlinear least squares method is then devised to compute the optimum
3D AR coefficients. Various experiments conducted in this paper have confirmed that the proposed method has superior
performance for bi-directional prediction.
This paper proposed a linear rate-distortion (RD) cost model for skip mode in H.264/MPEG-4 AVC. The proposed RD
cost model is derived theoretically based on the quantization scheme of H.264, and the simulation results also verify that
the proposed RD cost model can estimate the RD cost of skip mode accurately. Based on the proposed RD cost model,
an early skip mode selection algorithm is provided. The proposed early skip mode selection method can terminate the
mode selection process adaptively and works well for both low complexity and high complexity video. Experimental
results show that the proposed early skip mode selection algorithm can save the encoding time up to 56% with negligible
performance loss compared with the original reference encoder.
A high definition video coding technique using super-macroblocks is investigated in this work. Our research is motivated
by the observation that the macroblock-based partition in H.264/AVC may not be efficient for high definition video since the
maximum macroblock size of 16 x 16 is relatively small against the whole image size. In the proposed super-macboblock
based video coding scheme, the original block size MxN in H.264 is scaled to 2Mx2N. Along with the super-macroblock
prediction framework, a low-complexity 16 x 16 discrete cosine transform (DCT) is proposed. As compared with the 1D
8 x 8 DCT, only 16 additions are added for a 1D 16 points 16 x 16 DCT. Furthermore, an adaptive scheme is proposed for
the selection the best coding mode and best transform size. It is shown by experimental results that the super-macroblock
coding scheme can achieve a higher coding gain.
The motion vector prediction (MVP) is an important part of video coding. There have been numerous workings on the
topic done by researchers before. In this paper, a continue study on MVP of video coding based on the workings of
predecessors is made. The video sequences with various motion characteristics are further investigated. The
characteristics of motion vectors of objects in video scenes are discussed briefly. Then, summarizing these
characteristics, two MVP schemes for a new coding standard, Audio and Video Standard (AVS), are proposed. In these
schemes, current block's MV can be predicted based on statistical correlation of MVs of spatial contiguous neighbor
blocks. A correlation criterion is employed to measure how correlated between two MVs. With the correlation criterion,
the correlated MVs of neighbor blocks are determined. Then, the predicted MV of current block can be obtained with
some simple algebraic operations on determined MVs. The two proposed schemes, as the alternative ones of median
predictor, are suitable for different video sequences with different motion characteristics, respectively. The experimental
results show that the bit rate savings are achieved with these schemes in most of typical video sequences, compared with
the median predictor implemented in AVS.
Scalable video coding (SVC) has become more and more important with the enrichment of multimedia data and the diversification of network and terminal devices. In current MPEG SVC activities, a scalable extension of H.264/AVC, called scalable video model (SVM), is proposed by HHI and has shown further coding efficiency improvement and scalability functionality. However, the SVM model doesn't provide an efficient rate control scheme now, and rate control is achieved through a full search for selecting a suitable quantization parameter (QP). That is very inefficient and much time-consuming. In this paper, an efficient rate control scheme is proposed for the SVM, which is derived from the state-of-the-art hybrid rate control schemes of JVT with some considerations for scalable video coding. In the proposed rate control scheme, the rate distortion optimization (RDO) involved in the step of encoding temporal subband pictures is only implemented on the low-pass subband pictures, and rate control is independently applied to each spatial layer. For each spatial layer, the rate control is implemented at GOP, picture and basic unit levels. Furthermore, for the temporal subband pictures obtained from the motion compensation temporal filtering (MCTF), the target bit allocation and quantization parameter selection inside a GOP could make full use of the hierarchical relations inherent from the MCTF. The proposed rate control scheme has been implemented into SVM3.0 and experiment results show that the proposed algorithm can achieve the target bit rate with little bit rate fluctuation and keep fine image quality at the same time, but the computation complexity is reduced heavily.
In H.264/AVC, an integer 4×4 transform is used instead of traditional float DCT transform due to its low complexity and exact reversibility. Combined with the normalization for the integer transform together, a division-free quantization scheme is used in H.264/AVC. H.264/AVC is the most outstanding video coding standard at far. But at first H.264/AVC targets to low bit-rate coding, and almost all experimental results of the proposals for H.264/AVC are tested at low bit-rate. In the near, experimental results show that 8×8 transform can further improve the coding efficiency on high definition (HD) coding. In this paper a kind of 8×8 low complexity integer transforms are studied and corresponding quantization schemes are developed for HD coding. Compared with 4×4 transform/prediction based coder, the proposed 8×8 based coder can achieve better performance on HD coding while with much lower encoder/decoder complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.