Recent years have witnessed great advances in deep learning-based image compression, also known as learned image compression. An accurate entropy model is essential in learned image compression, since it can compress high-quality images with a lower bit rate. Current learned image compression schemes developed entropy models using context models and hyperpriors. Context models utilize local correlations within latent representations for better probability distribution approximation, while hyperpriors provide side information to estimate distribution parameters. Most recently, several transformer-based learned image compression algorithms have emerged and achieved state-of-the-art rate distortion performances, surpassing existing convolutional neural network (CNN)- based learned image compression and traditional image compression. Transformers are better at modeling long-distance dependencies and extracting global features than CNNs. However, the research of transformer-based image compression is still in its early stage. In this work, we propose a novel transformer-based learned image compression model. It adopts transformer structures in the main image encoder and decoder and in the context model. In particular, we propose a transformer-based spatial-channel auto-regressive context model. Encoded latent-space features are split into spatial-channel chunks, which are entropy encoded sequentially in a channelfirst order, followed by a 2D zigzag spatial order, conditioned on previously decoded feature chunks. To reduce the computational complexity, we also adopt a sliding window to restrict the number of chunks participating in the entropy model. Experimental studies on public image compression datasets demonstrate that our proposed transformer-based learned image codec outperforms traditional image compression and existing learned image compression models visually and quantitatively.
Video data has occupied people’s daily professional and entertainment activities. It imposes a big pressure on the Internet bandwidth. Hence, it is important to develop effective video coding techniques to compress video data as much as possible and save the transmission bandwidth, while still providing visually pleasing decoded videos. In conventional video coding such as the high efficiency video coding (HEVC) and the versatile video coding (VVC), signal processing and information theory-based techniques are mainstream. In recent years, thanks to the advances in deep learning, a lot of deep learning-based approaches have emerged for image and video compression. In particular, the generative adversarial networks (GAN) have shown superior performance for image compression. The decoded images are usually sharper and present more details than pure convolutional neural network (CNN)-based image compression and are more consistent with human visual system (HVS). Nevertheless, most existing GAN-based methods are for still image compression, and truly little research investigates the potential of GAN for video compression. In this work, we propose a novel inter-frame video coding scheme that compresses both reference frames and target (residue) frames by GAN. Since residue signals contain less energy, the proposed method effectively reduces the bit rates. Meanwhile, since we adopt adversarial learning, the perceptual quality of decoded target frames is well-preserved. The effectiveness of our proposed algorithm is demonstrated by experimental studies on common test video sequences.
KEYWORDS: Video compression, Video coding, Video, Visualization, Visual compression, Video processing, Motion estimation, Electronic components, Data storage
Video coding is the process of reducing the huge volume of video data to a small number of bits. High coding efficiency reduces the bandwidth required for video streaming, and the space required to store the video data on electronic devices, while maintaining the fidelity of the decompressed video signal. In recent years, deep learning has been extensively applied in the field of video coding. However, it remains challenging how to explore the intra- and inter-frame correlations in deep learning-based video coding systems to improve the coding efficiency. In this work, we propose a hierarchical motion estimation and compensation network for video compression. The video frames are tagged as intra-frames and inter-frames. While intra-frames are compressed independently, the inter-frames are hierarchically predicted by adjacent frames using a bi-directional motion prediction network, which results in highly sparse and compressible residue. The residue frames are then compressed via separately trained residue coding networks. Experimental results demonstrate that the proposed hierarchical deep video compression network offers significantly higher coding efficiencey and superior visual quality compared to prior arts.
Recent advances in deep learning have achieved great success in fundamental computer vision tasks such as classification, detection and segmentation. Nevertheless, the research effort in deep learning-based video coding is still in its infancy. State-of-the-art deep video coding networks explore temporal correlations by means of frame-level motion estimation and motion compensation, which require high computational complexity due to the frame size, while existing block-level interframe prediction schemes utilize only the co-located blocks in preceding frames, which did not consider object motions. In this work, we propose a novel motion-aware deep video coding network, in which inter-frame correlations are effectively explored via a block-level motion compensation network. Experimental results demonstrate that the proposed inter-frame deep video coding model significantly improves the decoding quality under the same compression ratio.
Future Internet-of-Things (IoT) will be featured by ubiquitous and pervasive vision sensors that generate enormous amount of streaming videos. The ability to analyze the big video data in a timely manner is essential to delay-sensitive applications, such as autonomous vehicles and body-worn cameras for police forces. Due to the limitation of computing power and storage capacity on local devices, the fog computing paradigm has been developed in recent years to process big sensor data closer to the end users while it avoids the transmission delay and huge uplink bandwidth requirements in cloud-based data analysis. In this work, we propose an edge-to-fog computing framework for object detection from surveillance videos. Videos are captured locally at an edge device and sent to fog nodes for color-assisted L1-subspace background modeling. The results are then sent back to the edge device for data fusion and final object detection. Experimental studies demonstrate that the proposed color-assisted background modeling offers more diversity than pure luminance based background modeling and hence achieves higher object detection accuracy. Meanwhile, the proposed edge-to-fog paradigm leverages the computing resources on multiple platforms.
We describe an iterative procedure for soft characterization of outlier data in any given data set. In each iteration, data compliance to nominal data behavior is measured according to current L1-norm principal-component subspace representations of the data set. Successively refined L1-norm subspace data set representations lead to successively refined outlier data characterization. The effectiveness of the proposed theoretical scheme is experimentally studied and the results show significantly improved performance compared to L2-PCA schemes, standard L1-PCA, and state-of-the-art robust PCA methods.
KEYWORDS: Video surveillance, Video, Compressed sensing, Principal component analysis, Surveillance, Binary data, Video compression, Matrices, Detection and tracking algorithms, Bismuth
We consider the problem of online foreground extraction from compressed-sensed (CS) surveillance videos. A technically novel approach is suggested and developed by which the background scene is captured by an L1- norm subspace sequence directly in the CS domain. In contrast to conventional L2-norm subspaces, L1-norm subspaces are seen to offer significant robustness to outliers, disturbances, and rank selection. Subtraction of the L1-subspace tracked background leads then to effective foreground/moving objects extraction. Experimental studies included in this paper illustrate and support the theoretical developments.
We consider the problem of representing individual faces by maximum L1-norm projection subspaces calculated from available face-image ensembles. In contrast to conventional L2-norm subspaces, L1-norm subspaces are seen to offer significant robustness to image variations, disturbances, and rank selection. Face recognition becomes then the problem of associating a new unknown face image to the “closest,” in some sense, L1 subspace in the database. In this work, we also introduce the concept of adaptively allocating the available number of principal components to different face image classes, subject to a given total number/budget of principal components. Experimental studies included in this paper illustrate and support the theoretical developments.
We propose and demonstrate a reliable and inexpensive tool for optical characterization of photonics metamaterials and metasurfaces. Existing characterization methods of metamaterials (or more precisely negative index metamaterials), including conventional interferometry and ellipsometry, are rather complex and expensive.
The “measurable” difference between, for example, positive index materials and negative index materials is that the former introduces a phase delay to transmitted light beam and the latter one introduces a phase advance. Here, we propose to use optical vortex interferometry to directly “visualize” phase delay or phase advance.
In the proposed setup a laser beams at the wavelength of 633 nm is separated in two by a beam splitter. One beam is transmitted through a spiral phase plate in order to generate a beam with an orbital angular momentum, and the second beam is transmitted through a nanostructured sample. Two beams are subsequently recombined by a beam splitter to form spiral interferogram. Spiral patterns are then analyzed to determine phase shifts introduced by the sample. In order to demonstrate the efficiency of the proposed technique, we fabricated four metasurface samples consisting of metal nano-antennas introducing different phase shifts and experimentally measured phase shifts of the transmitted light using the proposed technique. The experimental results are in good agreement with numerical simulations.
In summary, we report a novel method to characterize metasurfaces and metamaterials using optical vortex interferometry. The proposed characterization approach is simple, reliable and particularly useful for fast and inexpensive characterization of phase properties introduced by metamaterials and metasurfaces.
KEYWORDS: Video surveillance, Video, Video compression, Principal component analysis, Surveillance, Algorithm development, Compressed sensing, Video processing, Image restoration, Reconstruction algorithms
We consider the problem of foreground and background extraction from compressed-sensed (CS) surveillance video. We propose, for the first time in the literature, a principal component analysis (PCA) approach that computes the low-rank subspace of the background scene directly in the CS domain. Rather than computing the conventional L2-norm-based principal components, which are simply the dominant left singular vectors of the CS measurement matrix, we compute the principal components under an L1-norm maximization criterion. The background scene is then obtained by projecting the CS measurement vector onto the L1 principal components followed by total-variation (TV) minimization image recovery. The proposed L1-norm procedure directly carries out low-rank background representation without reconstructing the video sequence and, at the same time, exhibits significant robustness against outliers in CS measurements compared to L2-norm PCA.
We consider a compressive video acquisition system where frame blocks are sensed independently. Varying
block sparsity is exploited in the form of individual per-block open-loop sampling rate allocation with minimal
system overhead. At the decoder, video frames are reconstructed via sliding-window inter-frame total variation
minimization. Experimental results demonstrate that such rate-adaptive compressive video acquisition improves
noticeably the rate-distortion performance of the video stream over fixed-rate acquisition approaches.
We consider a video acquisition system where motion imagery is captured only by direct compressive sampling
(CS) without any other form of intelligent encoding/processing. In this context, the burden of quality video
sequence reconstruction falls solely on the decoder/player side. We describe a video CS decoding method that
implicitly incorporates motion estimation via sliding-window sparsity-aware recovery from locally estimated
Karhunen-Loeve bases. Experiments presented herein illustrate and support these developments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.