Paper
3 October 2024 Multifeature fusion viewport prediction method based on transformer
Jinpeng Song, Rongrong Zhang
Author Affiliations +
Proceedings Volume 13272, Fifth International Conference on Computer Vision and Data Mining (ICCVDM 2024); 132721B (2024) https://doi.org/10.1117/12.3048141
Event: 5th International Conference on Computer Vision and Data Mining (ICCVDM 2024), 2024, Changchun, China
Abstract
The continuous advancement of virtual reality and Internet technology has gradually integrated 360° video into daily life. With its characteristics of high immersion and interactivity, 360° video offers an immersive viewing experience that has garnered widespread attention. However, the rich content inherent in 360° videos results in large amounts of data, imposing significant demands on the network environment required for video transmission. Given the fixed field of view of headmounted displays (HMDs), theoretically, only a small part of the video content needs to be guaranteed to ensure a seamless viewing experience and significantly reduce bandwidth consumption. Therefore, accurate viewport prediction is crucial. In this paper, we propose a 360° video viewport prediction framework based on multi-feature fusion using Transformers to address the limitations of existing methods, which often focus on a single type of data and exhibit low long-term prediction accuracy. Our approach leverages three different types of data: the user's past head movement trajectory, video saliency, and cross-user interest areas, to achieve more robust viewport prediction. Experimental results demonstrate that, compared to existing methods, our approach more accurately predicts future viewport areas and significantly improves the accuracy of long-term predictions.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jinpeng Song and Rongrong Zhang "Multifeature fusion viewport prediction method based on transformer", Proc. SPIE 13272, Fifth International Conference on Computer Vision and Data Mining (ICCVDM 2024), 132721B (3 October 2024); https://doi.org/10.1117/12.3048141
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Transformers

Feature extraction

Feature fusion

Machine learning

Spherical lenses

Virtual reality

Back to Top