Video spatio-temporal generative adversarial network for local action generation

Xuejun Liu; Jiacheng Guo; Zhongji Cui; Ling Liu; Yong Yan; Yun Sha

doi:10.1117/1.JEI.32.5.053003

6 September 2023 Video spatio-temporal generative adversarial network for local action generation

Xuejun Liu, Jiacheng Guo, Zhongji Cui, Ling Liu, Yong Yan, Yun Sha

Author Affiliations +

Journal of Electronic Imaging, Vol. 32, Issue 5, 053003 (September 2023). https://doi.org/10.1117/1.JEI.32.5.053003

Abstract

Generating action videos in future scenes based on static images can make computer vision systems to be better applied for video understanding and intelligent decision-making. However, current models pay more attention to the motion trend of the generated objects, and the processing effect on local details is not ideal. The local features of the generated video will have the problem of blurred frames and incoherent motion. This paper proposes a two-stage model, video spatio-temporal generative adversarial network (VSTGAN), which consists of two GAN networks, such as temporal network and spatial network (S-net). The model fully combines the advantages of CNNs, recurrent neural networks (RNNs), and GANs to decompose the complex spatiotemporal generation problem into temporal and spatial dimensions. Therefore, VSTGAN can focus on local features from the above dimensions respectively. In the temporal dimension, we propose an RNN unit, the convolutional attention unit (ConvAU), which uses the convolutional attention module to dynamically generate weights to update the hidden state. Thus, T-net uses the ConvAU to generate local dynamics. In the spatial dimension, S-net uses CNNs and attention modules to perform resolution reconstruction of the generated local dynamics for video generation. We build two small-sample datasets and validate our approach on these two new datasets and the KTH public dataset. The results show that our approach can effectively generate local details in future action videos and that the model performance on small-sample datasets is competitive with the state-of-the-art in video generation.

Citation Download Citation

Xuejun Liu, Jiacheng Guo, Zhongji Cui, Ling Liu, Yong Yan, and Yun Sha "Video spatio-temporal generative adversarial network for local action generation," Journal of Electronic Imaging 32(5), 053003 (6 September 2023). https://doi.org/10.1117/1.JEI.32.5.053003

Received: 13 May 2023; Accepted: 22 August 2023; Published: 6 September 2023

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
19 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Video

Gallium nitride

Data modeling

Education and training

Motion models

3D modeling

Tunable filters

Show All Keywords

Keywords/Phrases

Search In:

Publication Years