Emerging 5G technologies bring various new opportunities for the media sector. In particular, they allow the incorporation of ultra-high resolution video formats and immersive augmented, virtual, and extended reality content into low-latency streaming applications while providing a reliable and high-quality user experience. In this paper, we focus on streaming 8K immersive content in a transmedia scenario and validate the feasibility of efficient, cost-effective solutions by measuring the added value brought by the latter. This is done using various key performance indicators in the framework of a European innovation project called 5GMediaHUB, with a scenario focusing on Interactive Digital Narratives. The main story is presented linearly to the user on the first screen, relying on an encoded video stream with marked frames. Each marked frame prompts the user to interact with the story on a secondary screen. The user is finally immersed in a Virtual Reality experience by completing a quiz on the second screen. The quality of the transmitted video is a critical requirement in this scenario, as well as a very low jitter and packet loss. Overall, a very low latency is required to allow effective and satisfactory interaction.
This paper explores the use of Mixed Reality in live television shows by allowing remote participants to “teleport” into a virtual studio. The solution utilizes background extraction (BE) and super-resolution (SR) modules to extract remote participants from their videos and composite them seamlessly into the studio footage, allowing for participation in live TV programs using standard mobile devices or webcams. This paper aims to investigate the impact of capturing devices and background settings on the output videos from the end user’s point of view. The results of the study are presented and discussed with a focus on the effectiveness of the BE and SR components in response to variations in capturing devices and background settings.
The stereoscopic 3D industry has fallen short of achieving acceptable Quality of Experience (QoE) because of various
technical limitations, such as excessive disparity, accommodation-convergence mismatch. This study investigates the
effect of scene content, camera baseline, screen size and viewing location on stereoscopic QoE in a holistic approach.
240 typical test configurations are taken into account, in which a wide range of disparity constructed from the shooting
conditions (scene content, camera baseline, sensor resolution/screen size) was selected from datasets, making the
constructed disparities locate in different ranges of maximal disparity supported by viewing environment (viewing
location). Second, an extensive subjective test is conducted using a single stimulus methodology, in which 15 samples at
each viewing location were obtained. Finally, a statistical analysis is performed and the results reveal that scene content,
camera baseline, as well as the interactions between screen size, scene content and camera baseline, have significant
impact on QoE in stereoscopic images, while other factors, especially viewing location involved, have almost no
significant impact. The generated Mean Opinion Scores (MOS) and the statistical results can be used to design
stereoscopic quality metrics and validate their performance.
There exist limitations in the human visual system (HVS) which allow images and video to be reconstructed using fewer
bits for the same perceived image quality. In this paper we will review the basis of spatial masking at edges and show a
new method for generating a just-noticeable distortion (JND) threshold. This JND threshold is then used in a spatial
noise shaping algorithm using a compressive sensing technique to provide a perceptual coding approach for JPEG2000
coding of images. Results of subjective tests show that the new spatial noise shaping framework can provide significant
savings in bit-rate compared to the standard approach. The algorithm also allows much more precise control of distortion
than existing spatial domain techniques and is fully compliant with part 1 of the JPEG2000 standard.
While the causes and nature of crosstalk, as well as crosstalk reduction techniques have been extensively studied, it is
still difficult to eliminate. Perceptually, crosstalk is one of the most annoying distortions in the visualization stage of
stereoscopic imaging. Therefore, to understand how users perceive crosstalk is of fundamental importance to improve
the quality of 3D presentations. In this paper, we aim at analyzing the impact of crosstalk level, camera baseline and
scene content on users' perception of crosstalk. Extensive subjective tests are conducted and the opinion scores are
statistically analyzed and discussed. The results indicate that crosstalk level, camera baseline, as well as scene content all
have major impacts on the perception of crosstalk. We also show that these three factors correlate with each other in
terms of impact on the crosstalk perception. Furthermore, we propose a content descriptor for crosstalk perception
(CDCP) and show its effectiveness.
This paper proposes an approach to improve the performance of peak signal-to-noise ratio (PSNR) and structural
similarity (SSIM) for image quality assessment in digital cinema applications. Based on the particularities of quality
assessment in a digital cinema setup, some attributes of the human visual system (HVS) are taken into consideration,
including the fovea acuity angle and contrast sensitivity, combined with viewing conditions in the cinema to select
appropriate image blocks for calculating the perceived quality by PSNR and SSIM. Furthermore, as the HVS is not able
to perceive all the distortions because of selective sensitivities to different contrasts, and masking always exists, we
adopt a modified PSNR by considering the contrast sensitivity function and masking effects. The experimental results
demonstrate that the proposed approach can evidently improve the performance of image quality metrics in digital
cinema applications.
One of the key issues for a successful roll out of digital cinema is in the quality it offers. The most practical and least
expensive way of measuring quality of multimedia content is through the use of objective metrics. In addition to the
widely used objective quality metric peak signal-to-noise ratio (PSNR), recently other metrics such as single scale
structural similarity (SS-SSIM) and multi scale structural similarity (MS-SSIM) have been claimed as good alternatives
for estimation of perceived quality by human subjects. The goal of this paper is to verify by means of subjective tests the
validity of such claims for digital cinema content and environment.
Traditional mechanisms for congestion control in multimedia streaming systems reduce the data transmission rate when
congestion is detected. Unfortunately, decreasing the rate of the media stream also decreases the media quality, but it is
the only way to combat congestion when it is caused by overwhelming traffic that exceeds the capacity of the network.
However, if the bottleneck is a wireless link, congestion is often derived from retransmissions caused by bit errors in the
radio link. If this is the case, it might be beneficial not to reduce the transmission rate, but allow delivery of packets
containing bit errors up to the application layer first. In this scenario, the quality of media will be impacted by bit errors
instead of lower coding rate. In this paper, we propose a system concept allowing bit errors in packets in order to relieve
congestion. We have built a simulation to compare the performance of the proposed system against traditional
congestion control. The results show that the proposed approach can improve the overall performance both by increasing
the throughput over the wireless and improving the perceived video quality in terms of peak signal-to-noise ratio
(PSNR).
In recent years digital imaging devices become an integral part of our daily lives due to the advancements in imaging, storage
and wireless communication technologies. Power-Rate-Distortion efficiency is the key factor common to all resource
constrained portable devices. In addition, especially in real-time wireless multimedia applications, channel adaptive and
error resilient source coding techniques should be considered in conjunction with the P-R-D efficiency, since most of the
time Automatic Repeat-reQuest (ARQ) and Forward Error Correction (FEC) are either not feasible or costly in terms of
bandwidth efficiency delay. In this work, we focus on the scenarios of real-time video communication for resource constrained
devices over bandwidth limited and lossy channels, and propose an analytic Power-channel Error-Rate-Distortion
(P-E-R-D) model. In particular, probabilities of macroblocks coding modes are intelligently controlled through an optimization
process according to their distinct rate-distortion-complexity performance for a given channel error rate. The
framework provides theoretical guidelines for the joint analysis of error resilient source coding and resource allocation.
Experimental results show that our optimal framework provides consistent rate-distortion performance gain under different
power constraints.
This paper demonstrates a robust compression/decompression system for still image coding. The error resilience is obtained by substituting a regular Variable Length Coding (VLC) scheme with a Reversible Variable Length Coding (RVLC) scheme. The results show that this substitution increases the coder robustness significantly. Results on the substitutions are obtained by comparing the performance of RVLC to an early implementation of JPEG2000 (VM3.0B). Reversible Variable Length Coders can decode independently both from the beginning and the end of the sequences. This achieves an increased robustness to errors in that more codewords will be decoded than in a regular VLC, which can only decode from the beginning of the sequence. The gain of our coders in the region of interest, bit error rates ranging from 10-4 to 10-2, are in order of 2 dB over the VM3.0B. Visually the differences are significant.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.