Various models have been proposed to predict the future head/gaze orientation of a user watching a 360-degree video. However, most of these models do not take sound information into account, and there are few studies on the influence of sound on users in VR space. This study proposes a multimodal model for predicting head/gaze orientation for 360-degree videos based on a new analysis of users' head/gaze behavior in VR space. First, we focus on whether people are attracted to the sound source of the 360-degree video or not. We conducted a head/gaze tracking experiment with 22 subjects in AV (Audio-Visual) and V (Visual) conditions using 32 videos. As a result, it was confirmed that whether they were attracted to the sound source differed depending on the video. Next, we trained a deep learning model based on the results and constructed and evaluated a multimodal model that combined visual and auditory information. As a result, we were able to construct a multimodal head/gaze prediction model that used the sound source explicitly. However, from the viewpoint of accuracy improvement, we could not confirm any advantage of multimodalization. Finally, a discussion of this problem and prospects is given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.